What is "random-state" in sklearn.model_selection.train_test_split example?

98,939

Solution 1

Isn't that obvious? 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything.

On a serious note, random_state simply sets a seed to the random generator, so that your train-test splits are always deterministic. If you don't set a seed, it is different each time.

Relevant documentation:

random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Solution 2

If you don't specify the random_state in the code, then every time you run(execute) your code a new random value is generated and the train and test datasets would have different values each time.

However, if a fixed value is assigned like random_state = 0 or 1 or 42 or any other integer then no matter how many times you execute your code the result would be the same .i.e, same values in train and test datasets.

Solution 3

Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.

Solution 4

When the Random_state is not defined in the code for every run train data will change and accuracy might change for every run. When the Random_state = " constant integer" is defined then train data will be constant For every run so that it will make easy to debug.

Solution 5

The random state is simply the lot number of the set generated randomly in any operation. We can specify this lot number whenever we want the same set again.

Share:
98,939

Related videos on Youtube

Saurabh
Author by

Saurabh

Updated on July 09, 2022

Comments

  • Saurabh
    Saurabh almost 2 years

    Can someone explain me what random_state means in below example?

    import numpy as np
    from sklearn.model_selection import train_test_split
    X, y = np.arange(10).reshape((5, 2)), range(5)
    
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.33, random_state=42) 
    

    Why is it hard coded to 42?

  • Danrex
    Danrex over 5 years
    That first sentence was more than enough.
  • Pleastry
    Pleastry over 3 years
    @cs95 Do I have to generate a new random_state for subsequent methods in my code? For example, if I set the random state as 42 for the train_test_split, do I set the random state also as 42 for the classifier I will be using on the split data? What about if I want to compare two different classifiers, do I use the same random state for both classifiers?
  • cs95
    cs95 over 3 years
    @Turtle I think you are looking to set a global seed so your pipeline is deterministic. This will only make the split deterministic, nothing else. Consider using something like np.random.seed or creating a random state object that is then reused across functions.
  • vanetoj
    vanetoj over 2 years
    but if you use it in train, test split do you still need to use it when you run each algorithm ?
  • Maxl Gemeinderat
    Maxl Gemeinderat almost 2 years
    How is the random_state saved? For example does it matter if I run my code on different Colab-Notebooks on different accounts?