How to get feature importance in Decision Tree?
10,474
Use the feature_importances_
attribute, which will be defined once fit()
is called. For example:
import numpy as np
X = np.random.rand(1000,2)
y = np.random.randint(0, 5, 1000)
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier().fit(X, y)
tree.feature_importances_
# array([ 0.51390759, 0.48609241])
Related videos on Youtube
Author by
merkle
Updated on June 04, 2022Comments
-
merkle almost 2 years
I have a dataset of reviews which has a class label of positive/negative. I am applying Decision Tree to that reviews dataset. Firstly, I am converting into a Bag of words. Here sorted_data['Text'] is reviews and final_counts is a sparse matrix.
I am splitting the data into train and test dataset.
X_tr, X_test, y_tr, y_test = cross_validation.train_test_split(sorted_data['Text'], labels, test_size=0.3, random_state=0) # BOW count_vect = CountVectorizer() count_vect.fit(X_tr.values) final_counts = count_vect.transfrom(X_tr.values)
applying the Decision Tree algorithm as follows
# instantiate learning model k = optimal_k # Applying the vectors of train data on the test data optimal_lambda = 15 final_counts_x_test = count_vect.transform(X_test.values) bow_reg_optimal = DecisionTreeClassifier(max_depth=optimal_lambda,random_state=0) # fitting the model bow_reg_optimal.fit(final_counts, y_tr) # predict the response pred = bow_reg_optimal.predict(final_counts_x_test) # evaluate accuracy acc = accuracy_score(y_test, pred) * 100 print('\nThe accuracy of the Decision Tree for depth = %f is %f%%' % (optimal_lambda, acc))
bow_reg_optimal is a decision tree classifier. Could anyone tell how to get the feature importance using the decision tree classifier?
-
merkle over 5 years
bow_reg_optimal.feature_importances_
and the output i am getting isarray([ 0., 0., 0., ..., 0., 0., 0.])
. Why I am getting all zeros? -
jakevdp over 5 yearsThe importances add up to 1. If that's the output you're getting, then the dominant features are probably not among the first three or last three, but somewhere in the middle.
-
merkle over 5 yearsOkay. Got it. Thanks a lot.
-
user84592 about 5 years@jakevdp I am wondering why the top ones are not the dominant feature?