About Tomasz Nurkiewicz

Java EE developer, Scala enthusiast. Enjoying data analysis and visualization. Strongly believes in the power of testing and automation.

Probability distribution for programmers

This is one of these very simple programming puzzles I came across recently: given a function returning random integers from 0 to 4 inclusive with equal probability, write a function returning random integers from 0 to 6 inclusive.
Of course the solution should also return equally distributed numbers. So let’s start from an input function sample definition:
 
 
 
 
 
 

def rand4() = (math.random * 5).toInt

Your task is to implement rand6() by only using rand4(). Give yourself few minutes and continue reading.

The first approach is pretty straightforward but happens to be completely broken:

def rand6() = rand4() * 3 / 2

As simple as that. In ideal solution each output value from 0 to 6 should appear with the probability of 1/7. Can you tell from the code above, what’s the probability of rand6() returning 2 or 5? That’s right, it’s no more than 0, you’ll never get these values. I hope it’s clear why. So let’s go for something more sophisticated:

def rand6() = (rand4() + rand4()) % 7

Looks better, but still pretty far. The code above has two major flaws. First of all the results of rand4() + rand4() expression range from 0 to 8 but we need 0 to 6. The obvious solution is to use % 7 operation. However this results in 0 and 1 being returned twice as often because 7 and 8 are overflowing to 0 and 1. So what about this:

def rand6(): Int = {
    val rand8 = rand4() + rand4()
    if(rand8 > 6)
        rand6()
    else
        rand8
}

I hope the recursion (which can easily be turned into loop, but I leave that work to the Scala compiler) is not obscuring the intent – if the sum of two rand4() invocations is above expected result, we simply discard it and call rand6() again. However there is still one subtle but striking bug, quoting Wikipedia on uniform distribution. The sum of two independent, equally distributed, uniform distributions yields a symmetric triangular distribution. If you don’t quite get the above, have a look at this live demo in JavaScript using <canvas/> illustrating what Wikipedia means:

This program simply places pixels at random (X, Y) positions on each panel. In the first panel I use one Math.random() * 300 call scaled to fit whole canvas. As you can see the distribution is more or less uniform. But we can’t tell that about second and third panels. On the second panel I am using the sum of two uniformly distributed variables, in principle: (Math.random() + Math.random()) * 150). Even though this expression can return anything between 0 and 300, the points are very biased toward the middle of the canvas (triangular distribution). The same behaviour is emphasized on the third panel where ten invocations of Math.random() are used.

The correct answer

The approach I’m taking is based on the observation that rand4() is capable of producing two random least significant bits. So let’s start from implementing rand3() with known semantics:

def rand3(): Int = rand4() match {
    case 4 => rand3()
    case x => x
}

rand3() returns uniformly distributed values from 0 to 3 doing so by rejecting 4 output of rand4(). How will that help us? Well, we now have two random bits, each one being either 0 or 1 with 50% probability. We can easily widen it for larger sequences, e.g. rand15() and rand7():

def rand15() = (rand3() << 2) + rand3()
def rand7() = rand15() >> 1

You should be rather comfortable with the bit fiddling above. Having the ability to produce two random bits I can easily generate 4 and 3. Now rand6() is a no-brainer:

def rand6() = rand7() match {
    case 7 => rand6()
    case x => x
}

Just to make this lesson a little bit more interesting, let’s implement randN(n: Int) on top of rand4(). randN() should return uniformly distributed natural values from 0 to n. I’ll begin by implementing helper method atLeastKrandBits(k: Int) returning… at least K random bits:

def atLeastKrandBits(k: Int): Int = k match {
    case 0 => 0
    case 1 => rand3() >> 1
    case 2 => rand3()
    case b => (atLeastKrandBits(k - 2) << 2) + rand3()
}

Alternative implementation with foldLeft():

def atLeastKrandBits(k: Int) = (0 to (k + 1) / 2).foldLeft(0){
    (acc, _) => (acc << 2) + rand3()
}

…or if you really hate those to maintain your code:

def atLeastKrandBits(k: Int) = (0 /: (0 to (k + 1) / 2)){
    (acc, _) => (acc << 2) + rand3()
}

Having any of the implementations above randN(n: Int) is simple:

def randN(n: Int) = {
    val bitsCount = java.lang.Integer.highestOneBit(n)
    val randBits = atLeastKrandBits(bitsCount)
    if(randBits > n)
        randN(n)
    else
        randBits
}

Conclusions

You might ask yourself a question: why should I even care? If you fail to understand probability distribution your application might produce random output that’s actually quite easy to predict. It’s not a big deal if you are writing a game and enemies are more likely to appear at some places on the map (but the players will discover and abuse it!) But if you need random numbers for security reasons or you rely on uniform distribution e.g. for load-balancing purposes – any bias might become fatal.
 

Reference: Probability distribution for programmers from our JCG partner Tomasz Nurkiewicz at the Java and neighbourhood blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

One Response to "Probability distribution for programmers"

  1. Jacek says:

    Hello,

    I think you could do with rand4 function only (and some additional function, say F)
    First divide 0..4 to two sets containing 2 arbitrary integers each, e.g {0,1} and {3,4}
    F would return 0 if rand4 returned 0 or 1 and 1 if rand4 returned 3 or 4 (or run rand4 again if there was 2)

    Then you generate binary sequence, and the first 3-element fragment of it different from, e.g 111 (but again, this is arbitrary, but taken for simplicity as it is 7 in binary) is our random number from 0..6. The are of course 8 different 3-element sequences, but all of them are equally probable, 1/8, and we are only interested in space of 7, so (1/8)/(7/8) gives us 1/7

    Rgds,

Leave a Reply


× five = 15



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close