a Panel regression in Python

python pandas statsmodels

15,745

Solution 1

Try the below - I've copied the stock data from the above link and added random data for the x column. For a panel regression you need a 'MultiIndex' as mentioned in the comments.

df = pd.DataFrame(df.set_index('dates').stack())
df.columns = ['y']
df['x'] = np.random.random(size=len(df.index))
df.info()

MultiIndex: 100 entries, (2015-04-03 00:00:00, AB INBEV) to (2015-05-01 00:00:00, ZC.PA)
Data columns (total 2 columns):
y    100 non-null float64
x    100 non-null float64
dtypes: float64(2)
memory usage: 2.3+ KB

regression = PanelOLS(y=df['y'], x=df[['x']])

regression

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         100
Number of Degrees of Freedom:   2

R-squared:         0.0042
Adj R-squared:    -0.0060

Rmse:              0.2259

F-stat (1, 98):     0.4086, p-value:     0.5242

Degrees of Freedom: model 1, resid 98

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x    -0.0507     0.0794      -0.64     0.5242    -0.2063     0.1048
     intercept     2.1952     0.0448      49.05     0.0000     2.1075     2.2829
---------------------------------End of Summary---------------------------------

Solution 2

as you mentioned above I changed my code in the following way:

I transformed the stacks into two dataframes
I concated them into a single multi index dataframe

ran the regression and added time effects

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA)
Data columns (total 2 columns):
indvalues    5096 non-null float64
avgvalues    5096 non-null float64
dtypes: float64(2)
memory usage: 119.4+ KB

from pandas.stats.plm import PanelOLS
regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)

the regression now works very nicely! Thank you Stefan Jansen

15,745

Author by

Admin

Updated on June 08, 2022

Comments

Admin almost 2 years
I'm trying to run a panel regression on pandas Dataframes:

Currently I have two dataframes each containing 52 rows(dates)*99 columns(99stocks) :Markdown file with data representation

When running:
```
est=sm.OLS(Stockslist,averages).fit()
est.summary()
```
I get the ValueError: shapes (52,99) and (52,99) not aligned: 99 (dim 1) != 52 (dim 0)

Can somebody point me out what I am doing wrong? The model is simply y(i,t)=x(i,t)+error term so no intercept. However I would like to add time effects in the future.

Kind regards, Jeroen
Admin about 8 years

I was still wondering if statsmodels doesn't offer any panel regression options
Stefan about 8 years

For more serious econometrics you're better off with R, I'm afraid, or any of the commercial packages. Here's an attempt to implement something but not sure it has move beyond the gist stage: gist.github.com/vincentarelbundock/5053686