How do I use a TimeSeriesSplit with a GridSearchCV object to tune a model in scikit-learn?

17,683

It turns out the problem was I was using GridSearchCV from sklearn.grid_search, which is deprecated. Importing GridSearchCV from sklearn.model_selection resolved the problem:

import xgboost as xgb
from sklearn.model_selection import TimeSeriesSplit, GridSearchCV
import numpy as np
X = np.array([[4, 5, 6, 1, 0, 2], [3.1, 3.5, 1.0, 2.1, 8.3, 1.1]]).T
y = np.array([1, 6, 7, 1, 2, 3])

model = xgb.XGBRegressor()
param_search = {'max_depth' : [3, 5]}

tscv = TimeSeriesSplit(n_splits=2)
gsearch = GridSearchCV(estimator=model, cv=tscv,
                        param_grid=param_search)
gsearch.fit(X, y)

gives:

GridSearchCV(cv=<generator object TimeSeriesSplit.split at 0x11ab4abf8>,
       error_score='raise',
       estimator=XGBRegressor(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0,
       learning_rate=0.1, max_delta_step=0, max_depth=3,
       min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
       objective='reg:linear', reg_alpha=0, reg_lambda=1,
       scale_pos_weight=1, seed=0, silent=True, subsample=1),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'max_depth': [3, 5]}, pre_dispatch='2*n_jobs',
       refit=True, return_train_score=True, scoring=None, verbose=0)
Share:
17,683
cd98
Author by

cd98

Updated on June 12, 2022

Comments

  • cd98
    cd98 almost 2 years

    I've searched the sklearn docs for TimeSeriesSplit and the docs for cross-validation but I haven't been able to find a working example.

    I'm using sklearn version 0.19.

    This is my setup

    import xgboost as xgb
    from sklearn.model_selection import TimeSeriesSplit
    from sklearn.grid_search import GridSearchCV
    import numpy as np
    X = np.array([[4, 5, 6, 1, 0, 2], [3.1, 3.5, 1.0, 2.1, 8.3, 1.1]]).T
    y = np.array([1, 6, 7, 1, 2, 3])
    tscv = TimeSeriesSplit(n_splits=2)
    for train, test in tscv.split(X):
        print(train, test)
    

    gives:

    [0 1] [2 3]
    [0 1 2 3] [4 5]
    

    If I try:

    model = xgb.XGBRegressor()
    param_search = {'max_depth' : [3, 5]}
    
    my_cv = TimeSeriesSplit(n_splits=2).split(X)
    gsearch = GridSearchCV(estimator=model, cv=my_cv,
                            param_grid=param_search)
    gsearch.fit(X, y)
    

    it gives: TypeError: object of type 'generator' has no len()

    I get the problem: GridSearchCV is trying to call len(cv) but my_cv is an iterator without length. However, the docs for GridSearchCV state I can use a

    int, cross-validation generator or an iterable, optional

    I tried using TimeSeriesSplit without the .split(X) but it still didn't work.

    I'm sure I'm overlooking something simple, thanks!!

  • Odisseo
    Odisseo over 5 years
    Maybe I'm doing something wrong but it seems to me that as of the current implementation the line my_cv = TimeSeriesSplit(n_splits=2).split(X) should actually be corrected to my_cv = TimeSeriesSplit(n_splits=2). Otherwise it will throw an error