Correlation between two dataframes by row

14,811

Solution 1

Depending on whether you want a cool or fast solution you can use either

diag(cor(t(df1), t(df2)))

which is cool but wasteful (because it actually computes correlations between all rows which you don't really need so they will be discarded) or

A <- as.matrix(df1)
B <- as.matrix(df2)
sapply(seq.int(dim(A)[1]), function(i) cor(A[i,], B[i,]))

which does only what you want but is a bit more to type.

Solution 2

I found that as.matrix is not required.

Correlations of all pairs of rows between dataframes df1 and df2:

sapply(1:nrow(df1), function(i) cor(df1[i,], df2[i,]))

and columns:

sapply(1:ncol(df1), function(i) cor(df1[,i], df2[,i]))
Share:
14,811
screechOwl
Author by

screechOwl

https://financenerd.blog/blog/

Updated on June 13, 2022

Comments

  • screechOwl
    screechOwl almost 2 years

    I have 2 data frames w/ 5 columns and 100 rows each.

    id       price1      price2     price3     price4     price5
     1         11.22      25.33      66.47      53.76      77.42
     2         33.56      33.77      44.77      34.55      57.42
    ...
    

    I would like to get the correlation of the corresponding rows, basically

    for(i in 1:100){    
    cor(df1[i, 1:5], df2[i, 1:5])    
    }
    

    but without using a for-loop. I'm assuming there's someway to use plyr to do it but can't seem to get it right. Any suggestions?

  • Josh O'Brien
    Josh O'Brien about 12 years
    +1 That first one is cool. Also, t(as.matrix(df1)) can become t(df1), etc., since the coercion to matrix takes place implicitly when t() is passed a data.frame.
  • Simon Urbanek
    Simon Urbanek about 12 years
    Ah, great, thanks (this is where my low-level thinking gets me ;)), I'll edit that
  • screechOwl
    screechOwl about 12 years
    That did it. Thank you very much.