Dropping 'nan' with Pearson's r in scipy/pandas

13,907

Solution 1

You can use np.isnan like this:

for i in range(len(frame3.columns)):    
    x, y = frame3.iloc[ :,i].values, control['CONTROL'].values
    nas = np.logical_or(x.isnan(), y.isnan())
    corr = sp.pearsonr(x[~nas], y[~nas])
    correlation.append(corr)

Solution 2

You can also try creating temporary dataframe, and used pandas built-in method for computing pearson correlation, or use the .dropna method in the temporary dataframe to drup null values before using sp.pearsonr

for col in frame3.columns:    
     correlation.append(frame3[col].to_frame(name='3').join(control['CONTROL']).corr()['3']['CONTROL'])
Share:
13,907
Lodore66
Author by

Lodore66

Updated on September 15, 2022

Comments

  • Lodore66
    Lodore66 over 1 year

    Quick question: Is there a way to use 'dropna' with the Pearson's r function in scipy? I'm using it in conjunction with pandas, and some of my data has holes in it. I know you used to be able suppress 'nan' with Spearman's r in older versions of scipy, but that functionality is now missing.

    To my mind, this seems like a disimprovement, so I wonder if I'm missing something obvious.

    My code:

    for i in range(len(frame3.columns)):    
        correlation.append(sp.pearsonr(frame3.iloc[ :,i], control['CONTROL']))
    
  • Daniel Gibson
    Daniel Gibson about 7 years
    This is making some assumptions about joining, eg: the indices are compatible
  • Steve Scott
    Steve Scott over 4 years
    I got the error AttributeError: 'numpy.ndarray' object has no attribute 'isnan'
  • ramesh
    ramesh over 4 years
    @SteveScott: instead of x.isnan(), try np.isnan(x)