a Panel regression in Python

15,745

Solution 1

Try the below - I've copied the stock data from the above link and added random data for the x column. For a panel regression you need a 'MultiIndex' as mentioned in the comments.

df = pd.DataFrame(df.set_index('dates').stack())
df.columns = ['y']
df['x'] = np.random.random(size=len(df.index))
df.info()

MultiIndex: 100 entries, (2015-04-03 00:00:00, AB INBEV) to (2015-05-01 00:00:00, ZC.PA)
Data columns (total 2 columns):
y    100 non-null float64
x    100 non-null float64
dtypes: float64(2)
memory usage: 2.3+ KB

regression = PanelOLS(y=df['y'], x=df[['x']])

regression

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         100
Number of Degrees of Freedom:   2

R-squared:         0.0042
Adj R-squared:    -0.0060

Rmse:              0.2259

F-stat (1, 98):     0.4086, p-value:     0.5242

Degrees of Freedom: model 1, resid 98

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x    -0.0507     0.0794      -0.64     0.5242    -0.2063     0.1048
     intercept     2.1952     0.0448      49.05     0.0000     2.1075     2.2829
---------------------------------End of Summary---------------------------------

Solution 2

as you mentioned above I changed my code in the following way:

  1. I transformed the stacks into two dataframes
  2. I concated them into a single multi index dataframe
  3. ran the regression and added time effects

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA)
    Data columns (total 2 columns):
    indvalues    5096 non-null float64
    avgvalues    5096 non-null float64
    dtypes: float64(2)
    memory usage: 119.4+ KB
    
    from pandas.stats.plm import PanelOLS
    regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)
    

the regression now works very nicely! Thank you Stefan Jansen

Share:
15,745
Admin
Author by

Admin

Updated on June 08, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm trying to run a panel regression on pandas Dataframes:

    Currently I have two dataframes each containing 52 rows(dates)*99 columns(99stocks) :Markdown file with data representation

    When running:

    est=sm.OLS(Stockslist,averages).fit()
    est.summary()
    

    I get the ValueError: shapes (52,99) and (52,99) not aligned: 99 (dim 1) != 52 (dim 0)

    Can somebody point me out what I am doing wrong? The model is simply y(i,t)=x(i,t)+error term so no intercept. However I would like to add time effects in the future.

    Kind regards, Jeroen

  • Admin
    Admin about 8 years
    I was still wondering if statsmodels doesn't offer any panel regression options
  • Stefan
    Stefan about 8 years
    For more serious econometrics you're better off with R, I'm afraid, or any of the commercial packages. Here's an attempt to implement something but not sure it has move beyond the gist stage: gist.github.com/vincentarelbundock/5053686