Multi-output regressor and sklearn's RFE module
RFE
does not support multi-label format because each target would result in selection of different combination of input features. Hence, you need to create individual RFE
for each target variable.
For example:
rfe = {}
for i in range(my_y.shape[1]):
rfe[i] = RFE(regress, 300)
rfe[i].fit(my_X, my_y[:,i])
feature_final = rfe[0].transform(my_X)
feature_final.shape
# (5000, 300)
Note from documentation of cross_val_predict
:
It is not appropriate to pass these predictions into an evaluation metric. Use
cross_validate
to measure generalization error.
Related videos on Youtube
Blade
Updated on December 02, 2022Comments
-
Blade over 1 year
I was wondering if it is possible to do RFE using a multi-variate estimator with
sklearn
package. I checked the documentation and I can't find anything for or against it. Here is the minimal code:import sklearn.linear_model as skl from sklearn.feature_selection import RFE from scat import * from sklearn import metrics, model_selection # -- params n_folds = 5 N = 5000 # -- regressor regress = skl.RidgeCV(alphas=[1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1]) # -- cross-validation P = np.random.permutation(N).reshape((n_folds, -1)) cross_val_folds = [] for i_fold in range(n_folds): fold = (np.concatenate(P[np.arange(n_folds) != i_fold], axis=0), P[i_fold]) cross_val_folds.append(fold) my_X = np.random.normal(0,1,[N, 315]) my_y = np.random.normal(0,1,[N, 2]) my_pred = model_selection.cross_val_predict(regress, X=my_X, y=my_y, cv=cross_val_folds) MAE = metrics.mean_absolute_error(my_y, my_pred) RMSE = np.sqrt(metrics.mean_squared_error(my_y, my_pred)) print('MAE: {}, RMSE: {}'.format(round(MAE, 5), round(RMSE, 5))) rfe = RFE(regress, 300) feature_final = rfe.fit_transform(my_X, my_y)
but I get the following error when testing it
ValueError: bad input shape (5000, 2)
which doesn't provide much information.
Edits:
Apparently, using RFE function, y goes through
y = column_or_1d(y, warn=True)
which requires y to be a vector. Is there a workaround for this?
-
Kinnectus almost 6 yearsAdditionally, by giving your computer the public IP of your connection I hope this also doesn't put that device in your modem/router DMZ and, essentially, open all ports to that device... that's just looking for trouble with a Windows box (if you haven't taken due care to manage the firewall rules)...
-
-
Blade almost 5 yearsThanks for the comment on RFE's support of multi-label format. I think creating 2 RFE's would be meaningless in this scenario. But a good idea would be doing RFE's based on different target variables in sequence.