What to use to do multiple correlation?

16,203

You could certainly do this with statsmodels and pandas. Something like this might get you started

import pandas
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = pandas.DataFrame([["A", 4, 0, 1, 27], 
                         ["B", 7, 1, 1, 29], 
                         ["C", 6, 1, 0, 23], 
                         ["D", 2, 0, 0, 20], 
                         ["etc.", 3, 0, 1, 21]], 
                         columns=["ID", "score", "male", "age20", "BMI"])
print data.corr()

model = ols("BMI ~ score + male + age20", data=data).fit()
print model.params
print model.summary()

Have a look at the documentation:

http://statsmodels.sourceforge.net/devel/

http://pandas.pydata.org/

Edit: I'm not familiar with the terminology multiple correlation coefficient, but I believe this is just square root of the R-squared of a multiple regression model no?

print model.rsquared**.5
print model.rsquared_adj**.5

Is this what you're after?

Share:
16,203
Pa_
Author by

Pa_

Updated on July 05, 2022

Comments

  • Pa_
    Pa_ almost 2 years

    I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible.

    edit [clarification]: Considering a situation like the one described here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_MultivariableMethods/ I would like to compute also multiple correlation coefficients for the predictors, in addition to the regression coefficients and the other regression parameters

  • bmu
    bmu over 11 years
    +1, is the formula api available in 0.4 or are you using a development version here?
  • jseabold
    jseabold over 11 years
    It was added in 0.5. A 0.5 prerelease is available on pypi with the formula framework available. The final release should be forthcoming before the end of the year hopefully.
  • Chase Denecke
    Chase Denecke almost 2 years
    I am getting an absurdly high correlation coefficient using this method despite no strong pairwise correlations. Anyone have suggestions as to what might be going on?