Does "n * (rand() / RAND_MAX)" make a skewed random number distribution?

16,000

Yes, it's skewed, unless your RAND_MAX happens to be a multiple of 10.

If you take the numbers from 0 to RAND_MAX, and try to divide them into 10 piles, you really have only three possibilities:

  1. RAND_MAX is a multiple of 10, and the piles come out even.
  2. RAND_MAX is not a multiple of 10, and the piles come out uneven.
  3. You split it into uneven groups to start with, but throw away all the "extras" that would make it uneven.

You rarely have control over RAND_MAX, and it's often a prime number anyway. That really only leaves 2 and 3 as possibilities.

The third option looks roughly like this: [Edit: After some thought, I've revised this to produce numbers in the range 0...(limit-1), to fit with the way most things in C and C++ work. This also simplifies the code (a tiny bit).

int rand_lim(int limit) {
/* return a random number in the range [0..limit)
 */

    int divisor = RAND_MAX/limit;
    int retval;

    do { 
        retval = rand() / divisor;
    } while (retval == limit);

    return retval;
}

For anybody who questions whether this method might leave some skew, I also wrote a rather different version, purely for testing. This one uses a decidedly non-random generator with a very limited range, so we can simply iterate through every number in the range. It looks like this:

#include <stdlib.h>
#include <stdio.h>

#define MAX 1009

int next_val() {
    // just return consecutive numbers
    static int v=0;

    return v++;
}

int lim(int limit) {
    int divisor = MAX/limit;
    int retval;

    do {
        retval = next_val() / divisor;
    } while (retval == limit);

    return retval;
}

#define LIMIT 10

int main() {

    // we'll allocate extra space at the end of the array:
    int buckets[LIMIT+2] = {0};
    int i;

    for (i=0; i<MAX; i++)
        ++buckets[lim(LIMIT)];

    // and print one beyond what *should* be generated
    for (i=0; i<LIMIT+1; i++)
        printf("%2d: %d\n", i, buckets[i]);
}

So, we're starting with numbers from 0 to 1009 (1009 is prime, so it won't be an exact multiple of any range we choose). So, we're starting with 1009 numbers, and splitting it into 10 buckets. That should give 100 in each bucket, and the 9 leftovers (so to speak) get "eaten" by the do/while loop. As it's written right now, it allocates and prints out an extra bucket. When I run it, I get exactly 100 in each of buckets 0..9, and 0 in bucket 10. If I comment out the do/while loop, I see 100 in each of 0..9, and 9 in bucket 10.

Just to be sure, I've re-run the test with various other numbers for both the range produced (mostly used prime numbers), and the number of buckets. So far, I haven't been able to get it to produce skewed results for any range (as long as the do/while loop is enabled, of course).

One other detail: there is a reason I used division instead of remainder in this algorithm. With a good (or even decent) implementation of rand() it's irrelevant, but when you clamp numbers to a range using division, you keep the upper bits of the input. When you do it with remainder, you keep the lower bits of the input. As it happens, with a typical linear congruential pseudo-random number generator, the lower bits tend to be less random than the upper bits. A reasonable implementation will throw out a number of the least significant bits already, rendering this irrelevant. On the other hand, there are some pretty poor implementations of rand around, and with most of them, you end up with better quality of output by using division rather than remainder.

I should also point out that there are generators that do roughly the opposite -- the lower bits are more random than the upper bits. At least in my experience, these are quite uncommon. That with which the upper bits are more random are considerably more common.

Share:
16,000
Robert
Author by

Robert

Updated on June 09, 2022

Comments

  • Robert
    Robert almost 2 years

    I'd like to find an unskewed way of getting random numbers in C (although at most I'm going to be using it for values of 0-20, and more likely only 0-8). I've seen this formula but after running some tests I'm not sure if it's skewed or not. Any help?

    Here is the full function used:

    int randNum() 
    { 
        return 1 + (int) (10.0 * (rand() / (RAND_MAX + 1.0)));
    }
    

    I seeded it using:

    unsigned int iseed = (unsigned int)time(NULL);
    srand (iseed);
    

    The one suggested below refuses to work for me I tried

    int greek; 
    for (j=0; j<50000; j++) 
    { 
    greek =rand_lim(5); 
    printf("%d, " greek); 
    greek =(int) (NUM * (rand() / (RAND_MAX + 1.0))); 
    int togo=number[greek]; 
    number[greek]=togo+1; 
    }
    

    and it stops working and gives me the same number 50000 times when I comment out printf.

  • Robert
    Robert about 12 years
    I tried that damn function before and it refuses to work. It just keeps giving me the same number over and over. If I have int greek; for (j=0; j<50000; j++) { greek =rand_lim(5); printf("%d, " greek); greek =(int) (NUM * (rand() / (RAND_MAX + 1.0))); int togo=number[greek]; number[greek]=togo+1; } If I edit out the print statement it gives me the same number 50000 times
  • Robert
    Robert about 12 years
    Oh yeah I turned it into a while loop instead of a do while loop which I thought would be the same.
  • Jerry Coffin
    Jerry Coffin about 12 years
    @RobertZ: I've added a little bit of demo code to the answer.
  • caf
    caf about 12 years
    If you take the numbers from 0 to RAND_MAX there are RAND_MAX + 1 numbers, so it is RAND_MAX + 1 that must be a multiple of 10.
  • Robert
    Robert about 12 years
    A thought occurs, couldn't I make it do{retval =rand() } while (retval<(Rand_MAX-(Rand_Max%limit+1))? That way if Rand_Max = 43502 it will ignore 43500, 43501 and 43502 and the results will be divisible?
  • Jerry Coffin
    Jerry Coffin about 12 years
    @caf: well, technically there are RAND_MAX+1 numbers, and the range is normally for 0..LIMIT, so there are LIMIT+1 numbers, so RAND_MAX+1 needs to be a multiple of LIMIT+1 -- not that it changes any of the basic idea involved.
  • Jerry Coffin
    Jerry Coffin about 12 years
    @RobertZ: I think you mean retval > (RAND_MAX(..., but yes you can re-cast it in that direction too.
  • Robert
    Robert about 12 years
    Oh and retval = retval%limit at the end. Although it would be between 0-9 and not 0-10.
  • Jerry Coffin
    Jerry Coffin about 12 years
    @caf: It produces un-skewed results as it is, but is more conservative than necessary, throwing away more numbers than it really needs to. Thanks for asking though -- I'd been meaning to check that for some time, and your asking get me to do so.
  • chux - Reinstate Monica
    chux - Reinstate Monica over 7 years
    Off by one. "RAND_MAX is a multiple of 10, and the piles come out even." --> "RAND_MAX + 1 is a multiple of 10, and the piles come out even.", etc.