how do i calculate correlation between corresponding columns of two matrices and not getting other correlations as output

11,095

Solution 1

I would probably personally just use diag:

> diag(cor(a,b))
[1]  1.0000000 -1.0000000 -0.6964286

But you could also use mapply:

> mapply(cor,a,b)
         a          b          c 
 1.0000000 -1.0000000 -0.6964286

Solution 2

The first answer above calculates all pairwise correlations, which is fine unless the matrices are large, and the second one doesn't work. As far as I can tell, efficient computation must be done directly, such as this code borrowed from borrowed from the arrayMagic Bioconductor package, works efficiently for large matrices:

> colCors = function(x, y) { 
+   sqr = function(x) x*x
+   if(!is.matrix(x)||!is.matrix(y)||any(dim(x)!=dim(y)))
+     stop("Please supply two matrices of equal size.")
+   x   = sweep(x, 2, colMeans(x))
+   y   = sweep(y, 2, colMeans(y))
+   cor = colSums(x*y) /  sqrt(colSums(sqr(x))*colSums(sqr(y)))
+   return(cor)
+ }

> set.seed(1)
> a=matrix(rnorm(15),nrow=5)
> b=matrix(rnorm(15),nrow=5)
> diag(cor(a,b))
[1]  0.2491625 -0.5313192  0.5594564
> mapply(cor,a,b)
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> colCors(a,b)
[1]  0.2491625 -0.5313192  0.5594564

Solution 3

mapply works with data frames but not matrices. That is because in data frames each column is an element, while in matrices each entry is an element.

In the answer above mapply(cor,as.data.frame(a),as.data.frame(b)) works just fine.

set.seed(1)
a=matrix(rnorm(15),nrow=5)
b=matrix(rnorm(15),nrow=5)
diag(cor(a,b))
[1]  0.2491625 -0.5313192  0.5594564
mapply(cor,as.data.frame(a),as.data.frame(b))
    V1         V2         V3 
 0.2491625 -0.5313192  0.5594564 

This is much more efficient for large matrices.

Share:
11,095
rder
Author by

rder

Updated on June 11, 2022

Comments

  • rder
    rder almost 2 years

    I have these data

    > a
         a    b    c
    1    1   -1    4
    2    2   -2    6
    3    3   -3    9
    4    4   -4   12
    5    5   -5    6
    
    > b
         d    e    f
    1    6   -5    7
    2    7   -4    4
    3    8   -3    3
    4    9   -2    3
    5   10   -1    9
    
    > cor(a,b)
               d            e             f
    a  1.0000000    1.0000000     0.1767767
    b -1.0000000    -1.000000    -0.1767767
    c  0.5050763    0.5050763    -0.6964286
    

    The result I want is just:

    cor(a,d) = 1
    cor(b,e) = -1
    cor(c,f) = -0.6964286
    
  • ferrelwill
    ferrelwill over 7 years
    Is it possible to add p-values and also adjusted p-values for multiple comparisons?