How to find the features names of the coefficients using scikit linear regression?

51,573

Solution 1

The trick is that right after you have trained your model, you know the order of the coefficients:

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
print(list(zip(model_1.coef_, model_1_features)))

This will print the coefficients and the correct feature. (Tested with pandas DataFrame)

If you want to reuse the coefficients later you can also put them in a dictionary:

coef_dict = {}
for coef, feat in zip(model_1.coef_,model_1_features):
    coef_dict[feat] = coef

(You can test it for yourself by training two models with the same features but, as you said, shuffled order of features.)

Solution 2

@Robin posted a great answer, but for me I had to make one tweak on it to work the way I wanted, and it was to refer to the dimension of the 'coef_' np.array that I wanted, namely modifying to this: model_1.coef_[0,:], as below:

coef_dict = {}
for coef, feat in zip(model_1.coef_[0,:],model_1_features):
    coef_dict[feat] = coef

Then the dict was created as I pictured it, with {'feature_name' : coefficient_value} pairs.

Solution 3

import pandas as pd

import numpy as np

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",regressor.coef_.transpose())
Share:
51,573
amehta
Author by

amehta

Updated on June 03, 2020

Comments

  • amehta
    amehta almost 4 years
    #training the model
    model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long']
    model_2_features = model_1_features + ['bed_bath_rooms']
    model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']
    
    model_1 = linear_model.LinearRegression()
    model_1.fit(train_data[model_1_features], train_data['price'])
    
    model_2 = linear_model.LinearRegression()
    model_2.fit(train_data[model_2_features], train_data['price'])
    
    model_3 = linear_model.LinearRegression()
    model_3.fit(train_data[model_3_features], train_data['price'])
    
    # extracting the coef
    print model_1.coef_
    print model_2.coef_
    print model_3.coef_
    

    If I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff