Why do I get only one parameter from a statsmodels OLS fit

python pandas linear-regression statsmodels

56,820

Solution 1

Try this:

X = sm.add_constant(X)
sm.OLS(y,X)

as in the documentation:

An intercept is not included by default and should be added by the user

statsmodels.tools.tools.add_constant

Solution 2

Just to be complete, this works:

>>> import numpy 
>>> import statsmodels.api as sm
>>> y = numpy.array([1,2,3,4,5,6,7,8,9])
>>> X = numpy.array([1,1,2,2,3,3,4,4,5])
>>> X = sm.add_constant(X)
>>> res_ols = sm.OLS(y, X).fit()
>>> res_ols.params
array([-0.35714286,  1.92857143])

It does give me a different slope coefficient, but I guess that figures as we now do have an intercept.

Solution 3

Try this, it worked for me:

import statsmodels.formula.api as sm

from statsmodels.api import add_constant

X_train = add_constant(X_train)

X_test = add_constant(X_test)


model = sm.OLS(y_train,X_train)

results = model.fit()

y_pred=results.predict(X_test)

results.params

Solution 4

I'm running 0.6.1 and it looks like the "add_constant" function has been moved into the statsmodels.tools module. Here's what I ran that worked:

res_ols = sm.OLS(y, statsmodels.tools.add_constant(X)).fit()

Solution 5

i did add the code X = sm.add_constant(X) but python did not return the intercept value so using a little algebra i decided to do it myself in code:

this code computes regression over 35 samples, 7 features plus one intercept value that i added as feature to the equation:

import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
import numpy as np
import pandas as pd

x=np.empty((35,8)) # (numSamples, oneIntercept + numFeatures))
feature_names = np.empty((8,))
y = np.empty((35,))

dbfv = open("dataset.csv").readlines()


interceptConstant = 1;
i = 0
# reading data and writing in numpy arrays
while i<len(dbfv):
    cells = dbfv[i].split(",")
    j = 0
    x[i][j] = interceptConstant
    feature_names[j] = str(j)
    while j<len(cells)-1:
        x[i][j+1] = cells[j]
        feature_names[j+1] = str(j+1)
        j += 1
    y[i] = cells[len(cells)-1]
    i += 1
# creating dataframes
df = pd.DataFrame(x, columns=feature_names)

target = pd.DataFrame(y, columns=["TARGET"])

X = df
y = target["TARGET"]

model = sm.OLS(y, X).fit()

print(model.params)

# predictions = model.predict(X) # make the predictions by the model


# Print out the statistics
print(model.summary())

View more solutions

56,820

Author by

Tom

Updated on July 12, 2022

Comments

Tom almost 2 years

Here is what I am doing:

$ python
Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>>> import statsmodels.api as sm
>>> statsmodels.__version__
'0.5.0'
>>> import numpy 
>>> y = numpy.array([1,2,3,4,5,6,7,8,9])
>>> X = numpy.array([1,1,2,2,3,3,4,4,5])
>>> res_ols = sm.OLS(y, X).fit()
>>> res_ols.params
array([ 1.82352941])

I had expected an array with two elements?!? The intercept and the slope coefficient?

Tom over 10 years

I was looking at the ols example ate the wls page so I guess that is why I overlooked the add_constant(), as it's not mentioned on that page.
Desta Haileselassie Hagos almost 7 years

@behzad-nouri, I would appreciate if you could have a look at this: stackoverflow.com/questions/44747203/…
FaCoffee over 6 years

I am quite puzzled by this. Why isn't an intercept added by default? Why do you want to run the linear regression without the bloody constant? It makes no sense to me.
Josef over 5 years

use import statsmodels.api as sm instead. formula.api will not have OLS (capital case) in the next release, only ols (lower case for formula interface)
Golden Lion about 2 years

what does adding a column of ones to an array do to X?