Save MinMaxScaler model in sklearn

54,145

Solution 1

So I'm actually not an expert with this but from a bit of research and a few helpful links, I think pickle and sklearn.externals.joblib are going to be your friends here.

The package pickle lets you save models or "dump" models to a file.

I think this link is also helpful. It talks about creating a persistence model. Something that you're going to want to try is:

# could use: import pickle... however let's do something else
from sklearn.externals import joblib 

# this is more efficient than pickle for things like large numpy arrays
# ... which sklearn models often have.   

# then just 'dump' your file
joblib.dump(clf, 'my_dope_model.pkl') 

Here is where you can learn more about the sklearn externals.

Let me know if that doesn't help or I'm not understanding something about your model.

Note: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead

Solution 2

Even better than pickle (which creates much larger files than this method), you can use sklearn's built-in tool:

from sklearn.externals import joblib
scaler_filename = "scaler.save"
joblib.dump(scaler, scaler_filename) 

# And now to load...

scaler = joblib.load(scaler_filename) 

Note: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead

Solution 3

Just a note that sklearn.externals.joblib has been deprecated and is superseded by plain old joblib, which can be installed with pip install joblib:

import joblib
joblib.dump(my_scaler, 'scaler.gz')
my_scaler = joblib.load('scaler.gz')

Note that file extensions can be anything, but if it is one of ['.z', '.gz', '.bz2', '.xz', '.lzma'] then the corresponding compression protocol will be used. Docs for joblib.dump() and joblib.load() methods.

Solution 4

You can use pickle, to save the scaler:

import pickle
scalerfile = 'scaler.sav'
pickle.dump(scaler, open(scalerfile, 'wb'))

Load it back:

import pickle
scalerfile = 'scaler.sav'
scaler = pickle.load(open(scalerfile, 'rb'))
test_scaled_set = scaler.transform(test_set)

Solution 5

The best way to do this is to create an ML pipeline like the following:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.externals import joblib


pipeline = make_pipeline(MinMaxScaler(),YOUR_ML_MODEL() )

model = pipeline.fit(X_train, y_train)

Now you can save it to a file:

joblib.dump(model, 'filename.mod') 

Later you can load it like this:

model = joblib.load('filename.mod')
Share:
54,145
Luis Ramon Ramirez Rodriguez
Author by

Luis Ramon Ramirez Rodriguez

Updated on May 22, 2020

Comments

  • Luis Ramon Ramirez Rodriguez
    Luis Ramon Ramirez Rodriguez almost 4 years

    I'm using the MinMaxScaler model in sklearn to normalize the features of a model.

    training_set = np.random.rand(4,4)*10
    training_set
    
           [[ 6.01144787,  0.59753007,  2.0014852 ,  3.45433657],
           [ 6.03041646,  5.15589559,  6.64992437,  2.63440202],
           [ 2.27733136,  9.29927394,  0.03718093,  7.7679183 ],
           [ 9.86934288,  7.59003904,  6.02363739,  2.78294206]]
    
    
    scaler = MinMaxScaler()
    scaler.fit(training_set)    
    scaler.transform(training_set)
    
    
       [[ 0.49184811,  0.        ,  0.29704831,  0.15972182],
       [ 0.4943466 ,  0.52384506,  1.        ,  0.        ],
       [ 0.        ,  1.        ,  0.        ,  1.        ],
       [ 1.        ,  0.80357559,  0.9052909 ,  0.02893534]]
    

    Now I want to use the same scaler to normalize the test set:

       [[ 8.31263467,  7.99782295,  0.02031658,  9.43249727],
       [ 1.03761228,  9.53173021,  5.99539478,  4.81456067],
       [ 0.19715961,  5.97702519,  0.53347403,  5.58747666],
       [ 9.67505429,  2.76225253,  7.39944931,  8.46746594]]
    

    But I don't want so use the scaler.fit() with the training data all the time. Is there a way to save the scaler and load it later from a different file?