Python pandas has no attribute ols - Error (rolling OLS)

11,403

pd.stats.ols.MovingOLS was removed in Pandas version 0.20.0

http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0200-prior-deprecations

https://github.com/pandas-dev/pandas/pull/11898

I can't find an 'off the shelf' solution for what should be such an obvious use case as rolling regressions.

The following should do the trick without investing too much time in a more elegant solution. It uses numpy to calculate the predicted value of the regression based on the regression parameters and the X values in the rolling window.

window = 1000
a = np.array([np.nan] * len(df))
b = [np.nan] * len(df)  # If betas required.
y_ = df.y.values
x_ = df[['x']].assign(constant=1).values
for n in range(window, len(df)):
    y = y_[(n - window):n]
    X = x_[(n - window):n]
    # betas = Inverse(X'.X).X'.y
    betas = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
    y_hat = betas.dot(x_[n, :])
    a[n] = y_hat
    b[n] = betas.tolist()  # If betas required.

The code above is equivalent to the following and about 35% faster:

model = pd.stats.ols.MovingOLS(y=df.y, x=df.x, window_type='rolling', window=1000, intercept=True)
y_pandas = model.y_predict
Share:
11,403
Desta Haileselassie Hagos
Author by

Desta Haileselassie Hagos

Updated on June 06, 2022

Comments

  • Desta Haileselassie Hagos
    Desta Haileselassie Hagos over 1 year

    For my evaluation, I wanted to run a rolling 1000 window OLS regression estimation of the dataset found in this URL: https://drive.google.com/open?id=0B2Iv8dfU4fTUa3dPYW5tejA0bzg using the following Python script.

    # /usr/bin/python -tt
    
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    from statsmodels.formula.api import ols
    
    df = pd.read_csv('estimated.csv', names=('x','y'))
    
    model = pd.stats.ols.MovingOLS(y=df.Y, x=df[['y']], 
                                   window_type='rolling', window=1000, intercept=True)
    df['Y_hat'] = model.y_predict
    

    However, when I run my Python script, I am getting this error: AttributeError: module 'pandas.stats' has no attribute 'ols'. Could this error be from the version that I am using? The pandas installed on my Linux node has a version of 0.20.2

  • Desta Haileselassie Hagos
    Desta Haileselassie Hagos over 6 years
    Yes, that's right as I have learned from the comments above. So, do you have any idea on how we can use it with the latest version of Pandas?
  • Alexander
    Alexander over 6 years
    @DestaHaileselassieHagos What results do you want from the rolling regression (e.g. slope, intercept, predicted value, etc)
  • Desta Haileselassie Hagos
    Desta Haileselassie Hagos over 6 years
    @Alexander, for example predicted value. Thanks!
  • Desta Haileselassie Hagos
    Desta Haileselassie Hagos over 6 years
    I actually rollback my pandas version to 0.18.0 and the ols is working now. Thank you so much!
  • Lost1
    Lost1 about 6 years
    @DestaHaileselassieHagos does the old package have any statistical feature which allow you to calculate the significance of the coefficients?
  • Desta Haileselassie Hagos
    Desta Haileselassie Hagos about 6 years
    @Lost1, no it doesn't.