Pandas Dataframe AttributeError: 'DataFrame' object has no attribute 'design_info'
Pickling and unpickling of a pandas DataFrame doesn't save and restore attributes that have been attached by a user, as far as I know.
Since the formula information is currently stored together with the DataFrame of the original design matrix, this information is lost after unpickling a Results and Model instance.
If you don't use categorical variables and transformations, then the correct designmatrix can be built with patsy.dmatrix. I think the following should work
x = patsy.dmatrix("B + C", data=df) # df is data for prediction
test2 = model.predict(x, transform=False)
or constructing the design matrix for the prediction directly should also work Note we need to explicitly add a constant that the formula adds by default.
from statsmodels.api import add_constant
test2 = model.predict(add_constant(df[["B", "C"]]), transform=False)
If the formula and design matrix contain (stateful) transformation and categorical variables, then it's not possible to conveniently construct the design matrix without the original formula information. Constructing it by hand and doing all the calculations explicitly is difficult in this case, and looses all the advantages of using formulas.
The only real solution is to pickle the formula information design_info
independently of the dataframe orig_exog
.
Michael
Updated on December 22, 2020Comments
-
Michael over 3 years
I am trying to use the
predict()
function of thestatsmodels.formula.api
OLS implementation. When I pass a new data frame to the function to get predicted values for an out-of-sample datasetresult.predict(newdf)
returns the following error:'DataFrame' object has no attribute 'design_info'
. What does this mean and how do I fix it? The full traceback is:p = result.predict(newdf) File "C:\Python27\lib\site-packages\statsmodels\base\model.py", line 878, in predict exog = dmatrix(self.model.data.orig_exog.design_info.builder, File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2088, in __getattr__ (type(self).__name__, name)) AttributeError: 'DataFrame' object has no attribute 'design_info'
EDIT: Here is a reproducible example. The error appears to occur when I pickle and then unpickle the result object (which I need to do in my actual project):
import cPickle import pandas as pd import numpy as np import statsmodels.formula.api as sm df = pd.DataFrame({"A": [10,20,30,324,2353], "B": [20, 30, 10, 1, 2332], "C": [0, -30, 120, 11, 2]}) result = sm.ols(formula="A ~ B + C", data=df).fit() print result.summary() test1 = result.predict(df) #works f_myfile = open('resultobject', "wb") cPickle.dump(result, f_myfile, 2) f_myfile.close() print("Result Object Saved") f_myfile = open('resultobject', "rb") model = cPickle.load(f_myfile) test2 = model.predict(df) #produces error
-
Josef over 10 yearsI opened an issue with statsmodels github.com/statsmodels/statsmodels/issues/1263
-
Michael over 10 yearsSolution 1 produces the same error in the sample code. Solution 2 gives
ValueError: matrices are not aligned
again with the sample code. -
Josef over 10 yearsI fixed both examples, in the first I forgot to add
transform=False
to avoid calling patsy, in the second example I just forgot to add the constant that patsy adds automatically.