Correlation coefficient of two columns in pandas dataframe with .corr()

27,660

Calling .corr() on the entire DataFrame gives you a full correlation matrix:

>>> table.corr()
        Group     Age
Group  1.0000 -0.1533
Age   -0.1533  1.0000

You can use the separate Series instead:

>>> table['Group'].corr(table['Age'])
-0.15330486289034567

This should be faster than using the full matrix and indexing it (with df.corr().iat['Group', 'Age']). Also, this should work whether Group is bool or int dtype.

Share:
27,660
florence-y
Author by

florence-y

Updated on August 06, 2022

Comments

  • florence-y
    florence-y over 1 year

    I would like to calculate the correlation coefficient between two columns of a pandas data frame after making a column boolean in nature. The original table had two columns: a Group Column with one of two treatment groups, now boolean, and an Age Group. Those are the two columns I'm looking to calculate the correlation coefficient.

    I tried the .corr() method, with:

    table.corr(method='pearson')
    

    but have this returned to me: enter image description here

    I have pasted the first 25 rows of boolean table below. I don't know if I'm missing parameters, or how to interpret this result. It's also strange that it's 1 as well. Thanks in advance!

        Group  Age
    0      1   50
    1      1   59
    2      1   22
    3      1   48
    4      1   53
    5      1   48
    6      1   29
    7      1   44
    8      1   28
    9      1   42
    10     1   35
    11     0   54
    12     0   43
    13     1   50
    14     1   62
    15     0   64
    16     0   39
    17     1   40
    18     1   59
    19     1   46
    20     0   56
    21     1   21
    22     1   45
    23     0   41
    24     1   46
    25     0   35
    
  • user1538798
    user1538798 over 3 years
    maybe you should have tested your soln since the poster post a set of data and expected answer.. res = df[['Group', 'Age']].corr(). print(res)
  • Admin
    Admin over 3 years
    it worked for me when i want to try on columns in df