How to get feature importance in Decision Tree?

10,474

Use the feature_importances_ attribute, which will be defined once fit() is called. For example:

import numpy as np
X = np.random.rand(1000,2)
y = np.random.randint(0, 5, 1000)

from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier().fit(X, y)
tree.feature_importances_
# array([ 0.51390759,  0.48609241])
Share:
10,474

Related videos on Youtube

merkle
Author by

merkle

Updated on June 04, 2022

Comments

  • merkle
    merkle almost 2 years

    I have a dataset of reviews which has a class label of positive/negative. I am applying Decision Tree to that reviews dataset. Firstly, I am converting into a Bag of words. Here sorted_data['Text'] is reviews and final_counts is a sparse matrix.

    I am splitting the data into train and test dataset.

    X_tr, X_test, y_tr, y_test = cross_validation.train_test_split(sorted_data['Text'], labels, test_size=0.3, random_state=0)
    
    # BOW
    count_vect = CountVectorizer() 
    count_vect.fit(X_tr.values)
    final_counts = count_vect.transfrom(X_tr.values)
    

    applying the Decision Tree algorithm as follows

    # instantiate learning model k = optimal_k
    # Applying the vectors of train data on the test data
    optimal_lambda = 15
    final_counts_x_test = count_vect.transform(X_test.values)
    bow_reg_optimal = DecisionTreeClassifier(max_depth=optimal_lambda,random_state=0)
    
    # fitting the model
    bow_reg_optimal.fit(final_counts, y_tr)
    
    # predict the response
    pred = bow_reg_optimal.predict(final_counts_x_test)
    
    # evaluate accuracy
    acc = accuracy_score(y_test, pred) * 100
    print('\nThe accuracy of the Decision Tree for depth = %f is %f%%' % (optimal_lambda, acc))
    

    bow_reg_optimal is a decision tree classifier. Could anyone tell how to get the feature importance using the decision tree classifier?

  • merkle
    merkle over 5 years
    bow_reg_optimal.feature_importances_ and the output i am getting is array([ 0., 0., 0., ..., 0., 0., 0.]). Why I am getting all zeros?
  • jakevdp
    jakevdp over 5 years
    The importances add up to 1. If that's the output you're getting, then the dominant features are probably not among the first three or last three, but somewhere in the middle.
  • merkle
    merkle over 5 years
    Okay. Got it. Thanks a lot.
  • user84592
    user84592 about 5 years
    @jakevdp I am wondering why the top ones are not the dominant feature?