How to generate random numbers in parallel?

24,923

Solution 1

I'll post here what I posted to Concurrent random number generation :

I think you're looking for rand_r(), which explicitly takes the current RNG state as a parameter. Then each thread should have its own copy of seed data (whether you want each thread to start off with the same seed or different ones depends on what you're doing, here you want them to be different or you'd get the same row again and again). There's some discussion of rand_r() and thread-safety here: whether rand_r is real thread safe? .

So say you wanted each thread to have its seed start off with its thread number (which is probably not what you want, as it would give the same results every time you ran with the same number of threads, but just as an example):

#pragma omp parallel default(none)
{
    int i;
    unsigned int myseed = omp_get_thread_num();
    #pragma omp for
    for(i=0; i<100; i++)
            printf("%d %d %d\n",i,omp_get_thread_num(),rand_r(&myseed));
}

Edit: Just on a lark, checked to see if the above would get any speedup. Full code was

#define NRANDS 1000000
int main(int argc, char **argv) {

    struct timeval t;
    int a[NRANDS];

    tick(&t);
    #pragma omp parallel default(none) shared(a)
    {
        int i;
        unsigned int myseed = omp_get_thread_num();
        #pragma omp for
        for(i=0; i<NRANDS; i++)
                a[i] = rand_r(&myseed);
    }
    double sum = 0.;
    double time=tock(&t);
    for (long int i=0; i<NRANDS; i++) {
        sum += a[i];
    }
    printf("Time = %lf, sum = %lf\n", time, sum);

    return 0;
}

where tick and tock are just wrappers to gettimeofday(), and tock() returns the difference in seconds. Sum is printed just to make sure that nothing gets optimized away, and to demonstrate a small point; you will get different numbers with different numbers of threads because each thread gets its own threadnum as a seed; if you run the same code again and again with the same number of threads you'll get the same sum, for the same reason. Anyway, timing (running on a 8-core nehalem box with no other users):

$ export OMP_NUM_THREADS=1
$ ./rand
Time = 0.008639, sum = 1074808568711883.000000

$ export OMP_NUM_THREADS=2
$ ./rand
Time = 0.006274, sum = 1074093295878604.000000

$ export OMP_NUM_THREADS=4
$ ./rand
Time = 0.005335, sum = 1073422298606608.000000

$ export OMP_NUM_THREADS=8
$ ./rand
Time = 0.004163, sum = 1073971133482410.000000

So speedup, if not great; as @ruslik points out, this is not really a compute-intensive process, and other issues like memory bandwidth start playing a role. Thus, only a shade over 2x speedup on 8 cores.

Solution 2

You cannot use the C rand() function from multiple threads; this results in undefined behavior. Some implementations might give you locking (which will make it slow); others might allow threads to clobber each other's state, possibly crashing your program or just giving "bad" random numbers.

To solve the problem, either write your own PRNG implementation or use an existing one that allows the caller to store and pass the state to the PRNG iterator function.

Solution 3

Get each thread to set a different seed based on its thread id, e.g. srand(omp_get_thread_num() * 1000);

Solution 4

It seems like that rand has a global shared state between all threads on Linux and a thread local storage state for it on Windows. The shared state on Linux is causing your slowdowns because of the necessary synchronization.

I don't think there is a portable way in the C library to use the RNG parallel on multiple threads, so you need another one. You could use a Mersenne Twister. As marcog said you need to initialize the seed for each thread differently.

Solution 5

On linux/unix you can use

long jrand48(unsigned short xsubi[3]);

where xsubi[3] encodes the state of the random number generator, like this:

#include<stdio.h>
#include<stdlib.h>
#include <algorithm> 
int main() {
  unsigned short *xsub;
#pragma omp parallel private(xsub)
  {  
    xsub = new unsigned short[3];
    xsub[0]=xsub[1]=xsub[2]= 3+omp_get_thread_num();
    int j;
#pragma omp for
    for(j=0;j<10;j++) 
      printf("%d [%d] %ld\n", j, omp_get_thread_num(), jrand48(xsub));
  }
}

compile with

g++-mp-4.4 -Wall -Wextra -O2 -march=native -fopenmp -D_GLIBCXX_PARALLEL jrand.cc -o jrand

(replace g++-mp-4.4 with whatever you need to call g++ version 4.4 or 4.3) and you get

$ ./jrand 
0 [0] 1344229389
1 [0] 1845350537
2 [0] 229759373
3 [0] 1219688060
4 [0] -553792943
5 [1] 360650087
6 [1] -404254894
7 [1] 1678400333
8 [1] 1373359290
9 [1] 171280263

i.e. 10 different pseudorandom numbers without any mutex locking or race conditions.

Share:
24,923
Tomek Tarczynski
Author by

Tomek Tarczynski

Updated on July 09, 2022

Comments

  • Tomek Tarczynski
    Tomek Tarczynski almost 2 years

    I want to generate pseudorandom numbers in parallel using openMP, something like this:

    int i;
    #pragma omp parallel for
    for (i=0;i<100;i++)
    {
        printf("%d %d %d\n",i,omp_get_thread_num(),rand());
    } 
    return 0; 
    

    I've tested it on windows and I got huge speedup, but each thread generated exactly the same numbers. I've tested it also on Linux and I got huge slowdown, parallel version on 8core processor was about 10 time slower than sequential, but each thread generated different numbers.

    Is there any way to have both speedup and different numbers?

    Edit 27.11.2010
    I think I've solved it using an idea from Jonathan Dursi post. It seems that following code works fast on both linux and windows. Numbers are also pseudorandom. What do You think about it?

    int seed[10];
    
    int main(int argc, char **argv) 
    {
    int i,s;
    for (i=0;i<10;i++)
        seed[i] = rand();
    
    #pragma omp parallel private(s)
    {
        s = seed[omp_get_thread_num()];
        #pragma omp for
        for (i=0;i<1000;i++)
        {
            printf("%d %d %d\n",i,omp_get_thread_num(),s);
            s=(s*17931+7391); // those numbers should be choosen more carefully
        }
        seed[omp_get_thread_num()] = s;
    }
    return 0; 
    } 
    

    PS.: I haven't accepted any answer yet, because I need to be sure that this idea is good.

  • Axel Gneiting
    Axel Gneiting over 13 years
    It's almost certain that this will not remove the slowdowns on Linux if there is not some logic that checks if the seed was initialized on all threads.
  • chrisaycock
    chrisaycock over 13 years
  • moinudin
    moinudin over 13 years
    @Axel That's probably because rand() has an atomic operation which it locks on. You'll have to look for a non-locking RNG.
  • chrisaycock
    chrisaycock over 13 years
    I tried rand_r() to see if a reentrant version was faster (no locking), but it took the same amount of time on my system.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    rand does not necessarily do locking, and should not. Calling it from multiple threads results in undefined behavior.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    Indeed there is no portable way short of wrapping the call to rand() with your own mutex... which would rather defeat the purpose.
  • Jonathan Dursi
    Jonathan Dursi over 13 years
    rand_r() is portable (in POSIX 1.c.) and reentrant.
  • Tomek Tarczynski
    Tomek Tarczynski over 13 years
    Axel was right, it didn't remove the slowdown problem on Linux.
  • Tomek Tarczynski
    Tomek Tarczynski over 13 years
    +1. I modified a little bit your solution and it seems to work good both on linux and windows.
  • Tomek Tarczynski
    Tomek Tarczynski over 13 years
    This is probably not a good solution for me, because my program generates a lot of random numbers and it should concurrent.
  • Tomek Tarczynski
    Tomek Tarczynski over 13 years
    I need to take a closer look at Mersenne Twister, because this method is not so obvious as most PRNG.
  • Axel Gneiting
    Axel Gneiting over 13 years
    Jonathan, a reentrant rand function won't help to do parallel number generation because it requires synchronization. Tomek are you using pure C or C++?
  • Axel Gneiting
    Axel Gneiting over 13 years
    I read the spec for rand_r and I was wrong. The seed is not a global state but given as a function parameter. So it would work, but it's not portable anyway.
  • Tomek Tarczynski
    Tomek Tarczynski over 13 years
    Could You elabrorate Your answer a little bit. I've never heard about jrand48, and I suppose this function is not is any standard library.
  • Riko Jacob
    Riko Jacob over 13 years
    jrand48 is in the "family" of drand48() and lrand48(). It is part of the "Standard C Library (libc, -lc)", hence the #include<stdlib.h>.
  • SChepurin
    SChepurin about 11 years
    +1. Very good. General idea is that rand() (and others) will lock the access effectively forcing threads to wait (making implementation close to single threaded or even slower) but never pointing at possibility to "...clobber each other's state, possibly crashing your program" which i saw in action!
  • Television
    Television about 5 years
    As a beginner to openmp, was curious why you have explicitly declared "int i" before initializing in the parallelized for loop? Would anything go wrong if I ininitialized "i" in the for loop?