What exactly does the Pandas random_state do?
As described in the documentation of
random_state parameter accepts either an integer (as in your case) or a
numpy.random.RandomState, which is a container for a Mersenne Twister pseudo random number generator.
If you pass it an integer, it will use this as a seed for a pseudo random number generator. As the name already says, the generator does not produce true randomness. It rather has an internal state (that you can get by calling
np.random.get_state()) which is initialized based on a seed. When initialized by the same seed, it will reproduce the same sequence of "random numbers".
If you pass it a RandomState it will use this (already initialized/seeded) RandomState to generate pseudo random numbers. This also allows you to get reproducible results by setting a fixed seed when initializing the RandomState and then passing this RandomState around. Actually you should prefer this over setting the seed of numpys internal RandomState. The reasoning being explained in this answer by Robert Kern and the comments to it. The idea is to have an independent stream which prevents other parts of the program to mess up your reproducibility by changing the seed of numpys internal RandomState.
Related videos on Youtube
Newskooler 3 months
I have the following code where I use the Pandas random_state
randomState = 123 sampleSize = 750 df = pd.read_csv(filePath, delim_whitespace=True) df_s = df.sample(n=sampleSize, random_state=randomState)
This generates a sample dataframe
df_s. Every time I run the code with the same
randomState, I get the same sample
df_s. When I change the value from
12the sample changes as well, so I guess that's what the
Any straight forward explanation with an example will be much appreciated.
ayhan over 5 years
shaik moeed about 1 yearWill setting
np.random.seed()is enough when we use only
sklearnto reproduce results?