Differences between numpy.random and random.random in Python

python random random-seed

51,748

Solution 1

You have made many correct observations already!

Unless you'd like to seed both of the random generators, it's probably simpler in the long run to choose one generator or the other. But if you do need to use both, then yes, you'll also need to seed them both, because they generate random numbers independently of each other.

For numpy.random.seed(), the main difficulty is that it is not thread-safe - that is, it's not safe to use if you have many different threads of execution, because it's not guaranteed to work if two different threads are executing the function at the same time. If you're not using threads, and if you can reasonably expect that you won't need to rewrite your program this way in the future, numpy.random.seed() should be fine. If there's any reason to suspect that you may need threads in the future, it's much safer in the long run to do as suggested, and to make a local instance of the numpy.random.Random class. As far as I can tell, random.random.seed() is thread-safe (or at least, I haven't found any evidence to the contrary).

The numpy.random library contains a few extra probability distributions commonly used in scientific research, as well as a couple of convenience functions for generating arrays of random data. The random.random library is a little more lightweight, and should be fine if you're not doing scientific research or other kinds of work in statistics.

Otherwise, they both use the Mersenne twister sequence to generate their random numbers, and they're both completely deterministic - that is, if you know a few key bits of information, it's possible to predict with absolute certainty what number will come next. For this reason, neither numpy.random nor random.random is suitable for any serious cryptographic uses. But because the sequence is so very very long, both are fine for generating random numbers in cases where you aren't worried about people trying to reverse-engineer your data. This is also the reason for the necessity to seed the random value - if you start in the same place each time, you'll always get the same sequence of random numbers!

As a side note, if you do need cryptographic level randomness, you should use the secrets module, or something like Crypto.Random if you're using a Python version earlier than Python 3.6.

Solution 2

From Python for Data Analysis, the module numpy.random supplements the Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

By contrast, Python's built-in random module only samples one value at a time, while numpy.random can generate very large sample faster. Using IPython magic function %timeit one can see which module performs faster:

In [1]: from random import normalvariate
In [2]: N = 1000000

In [3]: %timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
1 loop, best of 3: 963 ms per loop

In [4]: %timeit np.random.normal(size=N)
10 loops, best of 3: 38.5 ms per loop

Solution 3

The source of the seed and the distribution profile used are going to affect the outputs - if you are looking for cryptgraphic randomness, seeding from os.urandom() will get nearly real random bytes from device chatter (ie ethernet or disk) (ie /dev/random on BSD)

this will avoid you giving a seed and so generating determinisitic random numbers. However the random calls then allow you to fit the numbers to a distribution (what I call scientific random ness - eventually all you want is a bell curve distribution of random numbers, numpy is best at delviering this.

SO yes, stick with one generator, but decide what random you want - random, but defitniely from a distrubtuion curve, or as random as you can get without a quantum device.

51,748

Laura

Updated on March 07, 2021

Comments

Laura about 3 years

I have a big script in Python. I inspired myself in other people's code so I ended up using the numpy.random module for some things (for example for creating an array of random numbers taken from a binomial distribution) and in other places I use the module random.random.

Can someone please tell me the major differences between the two? Looking at the doc webpage for each of the two it seems to me that numpy.random just has more methods, but I am unclear about how the generation of the random numbers is different.

The reason why I am asking is because I need to seed my main program for debugging purposes. But it doesn't work unless I use the same random number generator in all the modules that I am importing, is this correct?

Also, I read here, in another post, a discussion about NOT using numpy.random.seed(), but I didn't really understand why this was such a bad idea. I would really appreciate if someone explain me why this is the case.
SingleNegationElimination over 12 years

As a distantly related note, it's sometimes neccesary to use neither, since the Mersenne twister does not produce random sequences of entropy sufficient for cryptographic (and some unusual scientific) purposes. In those rare cases, you often need Crypto.Random, which is able to use OS specific entropy sources to generate non-deterministic random sequences of much higher quality than is available from random.random alone. You usually don't need this, though.
Laura over 12 years

Thank you Hannnele. Your insights were really very useful! It turns out that I cannot get away with using ONLY a single random number generator, (which needs to be numpy since random doesn't produce binomial distributions) because parts of my program call another program which uses random. I will have to seed the two generators.
Laura over 12 years

Thank you very much Paul, your answer was really useful! I am not looking for cryptographic randomness, I am doing mathematical modeling and pseudo-random numbers are enough for me. It turns out I cannot stick to one generator as I wanted since I need numpy for the binomial distribution and my program calls another program that uses random :(
Kaushik Ghose over 9 years

"if you know which number you have now, it's possible to predict with absolute certainty what number will come next." I think this statement might need some clarification. What is meant is that if you know the internal state of the generator you can reproduce the sequence - which is what you do when you seed the generator. Given a single number output from the generator you can not predict the next number. The period is so large you would probably need a long sequence of numbers before you could compute where you are on the pseudo-random sequence and thus predict the next one.
Shayan Amani almost 5 years

Not the case for other methods. compared np.random.randint(2) with random.randrange(2) and NumPy was slower one. NumPy: 1.25 us and Random: 891 ns. And also the same relation for np.random.rand() and random.random().