How can I use R^2 as an evaluation metric when modeling?
As I understand, you are looking for a way to obtain the r2
score when modeling with XGBoost. The following code will provide you the r2
score as the output,
xg = xgb.XGBRegressor()
best_xgb = GridSearchCV(
xg, param_grid=params, cv=10, verbose=0, n_jobs=-1)
scores = cross_val_score(best_xgb, X, y, scoring='r2', cv=kfold)
You can refer Scikit-learn documentation for further details on the cross_val_score function.
Hope this helps!
Zachary Bloss
Updated on June 15, 2022Comments
-
Zachary Bloss almost 2 years
I am using Python to train an XGBoost Regressor on a 25 feature column dataset and SKlearn's GridSearchCV for parameter tuning. GridSearchCV allows you to choose your scorer with the 'scoring' parameter, and
r2
is a valid option.grid = GridSearchCV(mdl, param_grid=params, verbose=1, cv=kfold, n_jobs=-1, error_score='raise',scoring='r2')`
However, when I look to use
r2
as my 'eval_metric' in thegrid.fit()
function, I don't have a great way to user2
.grid.fit(X_train, y_train, eval_set=[(X_test, y_test)], eval_metric='rmse', early_stopping_rounds=150)
I have tried using sklearns built-in
r2_score
method, but there are a few issues. The first being, anr2 score
is calculated given they_test
set against they_pred
set. And in order to have ay_pred
set, we need to fit the model. So you can see I'm running into a looping issue.I have tried a few things to get around this. The first being training the model and making predictions inside the eval_metric variable like below:
grid.fit(X_train, y_train, eval_set=[(X_test, y_test)], eval_metric=r2_score(y_test, mdl.predict(X_test)), early_stopping_rounds=150)
But I am given the following error:
xgboost.core.XGBoostError: need to call fit beforehand
Which makes sense.
Is there some way that I can grab the current parameters that the GridSearchCV is using, create and store predictions, and then use the
r2_score
as the eval_metric?My thoughts are this. The r2 score is a standard evaluation metric on a scale of 0 to 1 (1 being a perfect fit). This is a metric that if there were a way to standardize optimizing it, would have a very far reach across almost all machine learning.
-
Mischa Lisovyi over 5 yearsThere are a couple of issues, that are unclear to me: 1) Any evaluation metric is based on comparison of
y_test
vsy_pred
. So I do not see why is that a showstopper. 2) The signature of a callable is specific to xgboost, seeeval_metric
documentation in here: xgboost.readthedocs.io/en/latest/python/…. 3) why do you want to haveeval_metric
at the first place? it is not used in optimisation but only for monitoring of performance between iterations and early stopping. -
Vivek Kumar over 5 yearsAs @MykhailoLisovyi said, you are not using the parameters correctly. You dont need to pass the values in metric, only the callable, and the values will be passed into them automatically at appropriate time (after fitting the model and getting the predictions in iterations).
-