XGBoost plot_importance doesn't show feature names
Solution 1
You want to use the feature_names
parameter when creating your xgb.DMatrix
dtrain = xgb.DMatrix(Xtrain, label=ytrain, feature_names=feature_names)
Solution 2
If you're using the scikit-learn wrapper you'll need to access the underlying XGBoost Booster and set the feature names on it, instead of the scikit model, like so:
model = joblib.load("your_saved.model")
model.get_booster().feature_names = ["your", "feature", "name", "list"]
xgboost.plot_importance(model.get_booster())
Solution 3
train_test_split
will convert the dataframe to numpy array which dont have columns information anymore.
Either you can do what @piRSquared suggested and pass the features as a parameter to DMatrix constructor. Or else, you can convert the numpy array returned from the train_test_split
to a Dataframe and then use your code.
Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
test_size=0.2, random_state=42)
# See below two lines
X_train = pd.DataFrame(data=Xtrain, columns=feature_names)
Xval = pd.DataFrame(data=Xval, columns=feature_names)
dtrain = xgb.DMatrix(Xtrain, label=ytrain)
Solution 4
With Scikit-Learn Wrapper interface "XGBClassifier",plot_importance reuturns class "matplotlib Axes". So we can employ axes.set_yticklabels.
plot_importance(model).set_yticklabels(['feature1','feature2'])
Solution 5
An alternate way I found whiles playing around with feature_names
. While playing around with it, I wrote this which works on XGBoost v0.80 which I'm currently running.
## Saving the model to disk
model.save_model('foo.model')
with open('foo_fnames.txt', 'w') as f:
f.write('\n'.join(model.feature_names))
## Later, when you want to retrieve the model...
model2 = xgb.Booster({"nthread": nThreads})
model2.load_model("foo.model")
with open("foo_fnames.txt", "r") as f:
feature_names2 = f.read().split("\n")
model2.feature_names = feature_names2
model2.feature_types = None
fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model2, max_num_features = 5, ax=ax)
So this is saving feature_names
separately and adding it back in later. For some reason feature_types
also needs to be initialized, even if the value is None
.
stackoverflowuser2010
Updated on August 25, 2022Comments
-
stackoverflowuser2010 over 1 year
I'm using XGBoost with Python and have successfully trained a model using the XGBoost
train()
function called onDMatrix
data. The matrix was created from a Pandas dataframe, which has feature names for the columns.Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \ test_size=0.2, random_state=42) dtrain = xgb.DMatrix(Xtrain, label=ytrain) model = xgb.train(xgb_params, dtrain, num_boost_round=60, \ early_stopping_rounds=50, maximize=False, verbose_eval=10) fig, ax = plt.subplots(1,1,figsize=(10,10)) xgb.plot_importance(model, max_num_features=5, ax=ax)
I want to now see the feature importance using the
xgboost.plot_importance()
function, but the resulting plot doesn't show the feature names. Instead, the features are listed asf1
,f2
,f3
, etc. as shown below.I think the problem is that I converted my original Pandas data frame into a DMatrix. How can I associate feature names properly so that the feature importance plot shows them?