What is "random-state" in sklearn.model_selection.train_test_split example?
Solution 1
Isn't that obvious? 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything.
On a serious note, random_state
simply sets a seed to the random generator, so that your train-test splits are always deterministic. If you don't set a seed, it is different each time.
random_state
:int
,RandomState
instance orNone
, optional (default=None
)
Ifint
,random_state
is the seed used by the random number generator; IfRandomState
instance,random_state
is the random number generator; IfNone
, the random number generator is theRandomState
instance used bynp.random
.
Solution 2
If you don't specify the random_state in the code, then every time you run(execute) your code a new random value is generated and the train and test datasets would have different values each time.
However, if a fixed value is assigned like random_state = 0 or 1 or 42 or any other integer then no matter how many times you execute your code the result would be the same .i.e, same values in train and test datasets.
Solution 3
Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.
Solution 4
When the Random_state is not defined in the code for every run train data will change and accuracy might change for every run. When the Random_state = " constant integer" is defined then train data will be constant For every run so that it will make easy to debug.
Solution 5
The random state is simply the lot number of the set generated randomly in any operation. We can specify this lot number whenever we want the same set again.
Related videos on Youtube
Saurabh
Updated on July 09, 2022Comments
-
Saurabh almost 2 years
Can someone explain me what
random_state
means in below example?import numpy as np from sklearn.model_selection import train_test_split X, y = np.arange(10).reshape((5, 2)), range(5) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)
Why is it hard coded to 42?
-
Kim Kern over 3 yearsDoes this answer your question? Random state (Pseudo-random number) in Scikit learn
-
-
Danrex over 5 yearsThat first sentence was more than enough.
-
Pleastry over 3 years@cs95 Do I have to generate a new
random_state
for subsequent methods in my code? For example, if I set the random state as 42 for thetrain_test_split
, do I set the random state also as 42 for the classifier I will be using on the split data? What about if I want to compare two different classifiers, do I use the same random state for both classifiers? -
cs95 over 3 years@Turtle I think you are looking to set a global seed so your pipeline is deterministic. This will only make the split deterministic, nothing else. Consider using something like np.random.seed or creating a random state object that is then reused across functions.
-
vanetoj over 2 yearsbut if you use it in train, test split do you still need to use it when you run each algorithm ?
-
Maxl Gemeinderat almost 2 yearsHow is the random_state saved? For example does it matter if I run my code on different Colab-Notebooks on different accounts?