FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan

11,568

I was able to reproduce the problem and the code fails to fit because there is an extra space in your eta parameter! Instead of this:

{'eta ':[0.01, 0.05, 0.1, 0.2]},...

Change it to this:

{'eta':[0.01, 0.05, 0.1, 0.2]},...

The error message was unfortunately not very helpful.

Share:
11,568

Related videos on Youtube

unstuck
Author by

unstuck

Updated on June 02, 2022

Comments

  • unstuck
    unstuck almost 2 years

    I'm trying to optimize the parameters learning rate and max_depth of a XGB regression model:

    from sklearn.model_selection import GridSearchCV
    from sklearn.model_selection import cross_val_score
    from xgboost import XGBRegressor
    
    param_grid = [
        # trying learning rates from 0.01 to 0.2
        {'eta ':[0.01, 0.05, 0.1, 0.2]},
        # and max depth from 4 to 10
        {'max_depth': [4, 6, 8, 10]}
      ]
    
    xgb_model = XGBRegressor(random_state = 0)
    grid_search = GridSearchCV(xgb_model, param_grid, cv=5,
                               scoring='neg_root_mean_squared_error',
                               return_train_score=True)
    
    grid_search.fit(final_OH_X_train_scaled, y_train)
    

    final_OH_X_train_scaled is the training dataset that contains only numerical features.

    y_train is the training label - also numerical.

    This is returning the error:

    FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan.
    

    I've seen other similar questions, but couldn't find an answer yet.

    Also tried with:

    param_grid = [
        # trying learning rates from 0.01 to 0.2
        # and max depth from 4 to 10
        {'eta ': [0.01, 0.05, 0.1, 0.2], 'max_depth': [4, 6, 8, 10]}   
      ]
    

    But it generates the same error.

    EDIT: Here's a sample of the data:

    final_OH_X_train_scaled.head()
    

    enter image description here

    y_train.head()
    

    enter image description here

    EDIT2:

    The data sample may be retrieved with:

    final_OH_X_train_scaled = pd.DataFrame([[0.540617 ,1.204666 ,1.670791 ,-0.445424 ,-0.890944 ,-0.491098 ,0.094999 ,1.522411 ,-0.247443 ,-0.559572 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0 ,0.0 ,0.0], 
                       [0.117467 ,-2.351903 ,0.718969 ,-0.119721 ,-0.874705 ,-0.530832 ,-1.385230 ,2.126612 ,-0.947731 ,-0.156967 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0 ,0.0 ,0.0 ,0.0 ,0.0], 
                       [0.901138 ,-0.208256 ,-0.019134 ,0.265250 ,-0.889128 ,-0.467753 ,0.169306 ,-0.973256 ,0.056164 ,-0.671978 , 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0 ,0.0 ,0.0],
                       [2.074639 ,0.100602 ,-1.645121 ,0.929598 ,0.811911 ,1.364560 ,0.337242 ,0.435187 ,-0.388075 ,1.279959 , 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0], 
                       [2.198099 ,-0.496254 ,-0.917933 ,-1.418407 ,-0.975889 ,1.044495 ,0.254181 ,1.335285 ,2.079415 ,2.071974 , 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0 ,0.0 ,0.0 ,0.0 ,0.0]],
                      columns=['cont0' ,'cont1' ,'cont2' ,'cont3' ,'cont4' ,'cont5' ,'cont6' ,'cont7' ,'cont8' ,'cont9' ,'31' ,'32' ,'33' ,'34' ,'35' ,'36' ,'37' ,'38' ,'39' ,'40'])
    
    • TC Arlen
      TC Arlen over 2 years
      Nothing looks obviously wrong to me. Can you post a few rows of your final_OH_X_train_scaled and y_train data so we can reproduce and debug? Possibly there's something wrong in your data.
    • unstuck
      unstuck over 2 years
      @TCArlen thank you so much for your feedback. Pls see my edit above
    • TC Arlen
      TC Arlen over 2 years
      Great, thanks. However, in order to inspect and to reproduce/debug on my machine I would need the training data rows/labels as code/data so I can run it myself. Can you post this as data rather than a screenshot?
    • TC Arlen
      TC Arlen over 2 years
      The data in the link is not the data that is transformed in the way that is shown in the screenshot above, from final_OH_X_train_scaled.head(). Please put these values into code like in this example question: stackoverflow.com/questions/68732791/… Do you see how the dataframe is constructed from code so it is reproducible example on another's machine? Thank you
    • unstuck
      unstuck over 2 years
      Ok, please see above