Save MinMaxScaler model in sklearn
Solution 1
So I'm actually not an expert with this but from a bit of research and a few helpful links, I think pickle
and sklearn.externals.joblib
are going to be your friends here.
The package pickle
lets you save models or "dump" models to a file.
I think this link is also helpful. It talks about creating a persistence model. Something that you're going to want to try is:
# could use: import pickle... however let's do something else
from sklearn.externals import joblib
# this is more efficient than pickle for things like large numpy arrays
# ... which sklearn models often have.
# then just 'dump' your file
joblib.dump(clf, 'my_dope_model.pkl')
Here is where you can learn more about the sklearn externals.
Let me know if that doesn't help or I'm not understanding something about your model.
Note: sklearn.externals.joblib
is deprecated. Install and use the pure joblib
instead
Solution 2
Even better than pickle
(which creates much larger files than this method), you can use sklearn
's built-in tool:
from sklearn.externals import joblib
scaler_filename = "scaler.save"
joblib.dump(scaler, scaler_filename)
# And now to load...
scaler = joblib.load(scaler_filename)
Note: sklearn.externals.joblib
is deprecated. Install and use the pure joblib
instead
Solution 3
Just a note that sklearn.externals.joblib
has been deprecated and is superseded by plain old joblib
, which can be installed with pip install joblib
:
import joblib
joblib.dump(my_scaler, 'scaler.gz')
my_scaler = joblib.load('scaler.gz')
Note that file extensions can be anything, but if it is one of ['.z', '.gz', '.bz2', '.xz', '.lzma']
then the corresponding compression protocol will be used. Docs for joblib.dump()
and joblib.load()
methods.
Solution 4
You can use pickle
, to save the scaler:
import pickle
scalerfile = 'scaler.sav'
pickle.dump(scaler, open(scalerfile, 'wb'))
Load it back:
import pickle
scalerfile = 'scaler.sav'
scaler = pickle.load(open(scalerfile, 'rb'))
test_scaled_set = scaler.transform(test_set)
Solution 5
The best way to do this is to create an ML pipeline like the following:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.externals import joblib
pipeline = make_pipeline(MinMaxScaler(),YOUR_ML_MODEL() )
model = pipeline.fit(X_train, y_train)
Now you can save it to a file:
joblib.dump(model, 'filename.mod')
Later you can load it like this:
model = joblib.load('filename.mod')
Luis Ramon Ramirez Rodriguez
Updated on May 22, 2020Comments
-
Luis Ramon Ramirez Rodriguez almost 4 years
I'm using the
MinMaxScaler
model in sklearn to normalize the features of a model.training_set = np.random.rand(4,4)*10 training_set [[ 6.01144787, 0.59753007, 2.0014852 , 3.45433657], [ 6.03041646, 5.15589559, 6.64992437, 2.63440202], [ 2.27733136, 9.29927394, 0.03718093, 7.7679183 ], [ 9.86934288, 7.59003904, 6.02363739, 2.78294206]] scaler = MinMaxScaler() scaler.fit(training_set) scaler.transform(training_set) [[ 0.49184811, 0. , 0.29704831, 0.15972182], [ 0.4943466 , 0.52384506, 1. , 0. ], [ 0. , 1. , 0. , 1. ], [ 1. , 0.80357559, 0.9052909 , 0.02893534]]
Now I want to use the same scaler to normalize the test set:
[[ 8.31263467, 7.99782295, 0.02031658, 9.43249727], [ 1.03761228, 9.53173021, 5.99539478, 4.81456067], [ 0.19715961, 5.97702519, 0.53347403, 5.58747666], [ 9.67505429, 2.76225253, 7.39944931, 8.46746594]]
But I don't want so use the
scaler.fit()
with the training data all the time. Is there a way to save the scaler and load it later from a different file?