How to predict new values using statsmodels.formula.api (python)

10,779

Solution 1

You can provide new values to the .predict() model as illustrated in output #11 in this notebook from the docs for a single observation. You can provide multiple observations as 2d array, for instance a DataFrame - see docs.

Since you are using the formula API, your input needs to be in the form of a pd.DataFrame so that the column references are available. In your case, you could use something like .predict(pd.DataFrame({'mean_area': [1,2,3]}).

statsmodels .predict() uses the observations used for fitting only as default when no alternative is provided.

Solution 2

import statsmodels.formula.api as smf


model = smf.ols('y ~ x', data=df).fit()

# Predict for a list of observations, list length can be 1 to many..**
prediction = model.get_prediction(exog=dict(x=[5,10,25])) 
prediction.summary_frame(alpha=0.05)
Share:
10,779
vishmay
Author by

vishmay

Theoretical physicist with mathematical interests.

Updated on June 13, 2022

Comments

  • vishmay
    vishmay almost 2 years

    I trained the logistic model using the following, from breast cancer data and ONLY using one feature 'mean_area'

    from statsmodels.formula.api import logit
    logistic_model = logit('target ~ mean_area',breast)
    result = logistic_model.fit()
    

    There is a built in predict method in the trained model. However that gives the predicted values of all the training samples. As follows

    predictions = result.predict()
    

    Suppose I want the prediction for a new value say 30 How do I used the trained model to out put the value? (rather than reading the coefficients and computing manually)