a Panel regression in Python
Solution 1
Try the below - I've copied the stock data from the above link and added random data for the x
column. For a panel regression you need a 'MultiIndex' as mentioned in the comments.
df = pd.DataFrame(df.set_index('dates').stack())
df.columns = ['y']
df['x'] = np.random.random(size=len(df.index))
df.info()
MultiIndex: 100 entries, (2015-04-03 00:00:00, AB INBEV) to (2015-05-01 00:00:00, ZC.PA)
Data columns (total 2 columns):
y 100 non-null float64
x 100 non-null float64
dtypes: float64(2)
memory usage: 2.3+ KB
regression = PanelOLS(y=df['y'], x=df[['x']])
regression
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 100
Number of Degrees of Freedom: 2
R-squared: 0.0042
Adj R-squared: -0.0060
Rmse: 0.2259
F-stat (1, 98): 0.4086, p-value: 0.5242
Degrees of Freedom: model 1, resid 98
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x -0.0507 0.0794 -0.64 0.5242 -0.2063 0.1048
intercept 2.1952 0.0448 49.05 0.0000 2.1075 2.2829
---------------------------------End of Summary---------------------------------
Solution 2
as you mentioned above I changed my code in the following way:
- I transformed the stacks into two dataframes
- I concated them into a single multi index dataframe
ran the regression and added time effects
<class 'pandas.core.frame.DataFrame'> MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA) Data columns (total 2 columns): indvalues 5096 non-null float64 avgvalues 5096 non-null float64 dtypes: float64(2) memory usage: 119.4+ KB from pandas.stats.plm import PanelOLS regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)
the regression now works very nicely! Thank you Stefan Jansen
Admin
Updated on June 08, 2022Comments
-
Admin almost 2 years
I'm trying to run a panel regression on pandas Dataframes:
Currently I have two dataframes each containing 52 rows(dates)*99 columns(99stocks) :Markdown file with data representation
When running:
est=sm.OLS(Stockslist,averages).fit() est.summary()
I get the ValueError: shapes (52,99) and (52,99) not aligned: 99 (dim 1) != 52 (dim 0)
Can somebody point me out what I am doing wrong? The model is simply y(i,t)=x(i,t)+error term so no intercept. However I would like to add time effects in the future.
Kind regards, Jeroen
-
Admin about 8 yearsI was still wondering if statsmodels doesn't offer any panel regression options
-
Stefan about 8 yearsFor more serious econometrics you're better off with R, I'm afraid, or any of the commercial packages. Here's an attempt to implement something but not sure it has move beyond the gist stage: gist.github.com/vincentarelbundock/5053686