Correlation coefficient of two columns in pandas dataframe with .corr()
Calling .corr()
on the entire DataFrame gives you a full correlation matrix:
>>> table.corr()
Group Age
Group 1.0000 -0.1533
Age -0.1533 1.0000
You can use the separate Series instead:
>>> table['Group'].corr(table['Age'])
-0.15330486289034567
This should be faster than using the full matrix and indexing it (with df.corr().iat['Group', 'Age']
). Also, this should work whether Group
is bool or int dtype.
florence-y
Updated on August 06, 2022Comments
-
florence-y over 1 year
I would like to calculate the correlation coefficient between two columns of a pandas data frame after making a column boolean in nature. The original
table
had two columns: aGroup
Column with one of two treatment groups, now boolean, and anAge
Group. Those are the two columns I'm looking to calculate the correlation coefficient.I tried the
.corr()
method, with:table.corr(method='pearson')
I have pasted the first 25 rows of boolean
table
below. I don't know if I'm missing parameters, or how to interpret this result. It's also strange that it's 1 as well. Thanks in advance!Group Age 0 1 50 1 1 59 2 1 22 3 1 48 4 1 53 5 1 48 6 1 29 7 1 44 8 1 28 9 1 42 10 1 35 11 0 54 12 0 43 13 1 50 14 1 62 15 0 64 16 0 39 17 1 40 18 1 59 19 1 46 20 0 56 21 1 21 22 1 45 23 0 41 24 1 46 25 0 35
-
user1538798 over 3 yearsmaybe you should have tested your soln since the poster post a set of data and expected answer.. res = df[['Group', 'Age']].corr(). print(res)
-
Admin over 3 yearsit worked for me when i want to try on columns in df