How to find the features names of the coefficients using scikit linear regression?
Solution 1
The trick is that right after you have trained your model, you know the order of the coefficients:
model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
print(list(zip(model_1.coef_, model_1_features)))
This will print the coefficients and the correct feature. (Tested with pandas DataFrame)
If you want to reuse the coefficients later you can also put them in a dictionary:
coef_dict = {}
for coef, feat in zip(model_1.coef_,model_1_features):
coef_dict[feat] = coef
(You can test it for yourself by training two models with the same features but, as you said, shuffled order of features.)
Solution 2
@Robin posted a great answer, but for me I had to make one tweak on it to work the way I wanted, and it was to refer to the dimension of the 'coef_' np.array that I wanted, namely modifying to this: model_1.coef_[0,:], as below:
coef_dict = {}
for coef, feat in zip(model_1.coef_[0,:],model_1_features):
coef_dict[feat] = coef
Then the dict was created as I pictured it, with {'feature_name' : coefficient_value} pairs.
Solution 3
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",regressor.coef_.transpose())
amehta
Updated on June 03, 2020Comments
-
amehta almost 4 years
#training the model model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long'] model_2_features = model_1_features + ['bed_bath_rooms'] model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long'] model_1 = linear_model.LinearRegression() model_1.fit(train_data[model_1_features], train_data['price']) model_2 = linear_model.LinearRegression() model_2.fit(train_data[model_2_features], train_data['price']) model_3 = linear_model.LinearRegression() model_3.fit(train_data[model_3_features], train_data['price']) # extracting the coef print model_1.coef_ print model_2.coef_ print model_3.coef_
If I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff