How to Correlate one variable to all other variables on R

39,106

Solution 1

Since you mention "metabolites", I assume your metric is "concentration", e.g. that you have a matrix, call it data that has one column for every metabolite, and one row for every sample.

So, something like this:

# just generates example - YOU SHOULD PROVIDE THIS!!!
data <- data.frame(tyrosine=1:10 + rnorm(10,sd=2), 
                   urea    =2*1:10 + rnorm(10,sd=2),
                   glucose =30 -2*1:10 +rnorm(10,sd=2),
                   inosine =25 -1:10 + rnorm(10,sd=2))
data
     tyrosine      urea  glucose  inosine
1  -0.2529076  5.023562 29.83795 26.71736
2   2.3672866  4.779686 27.56427 22.79442
3   1.3287428  4.757519 24.14913 22.77534
4   7.1905616  3.570600 18.02130 20.89239
5   5.6590155 12.249862 21.23965 17.24588
6   4.3590632 11.910133 17.88774 18.17001
7   7.9748581 13.967619 15.68841 17.21142
8   9.4766494 17.887672 11.05850 16.88137
9  10.1515627 19.642442 11.04370 18.20005
10  9.3892232 21.187803 10.83588 16.52635

To get correlation coefficients, just type:

cor(data)
           tyrosine       urea    glucose    inosine
tyrosine  1.0000000  0.8087897 -0.9545523 -0.8512938
urea      0.8087897  1.0000000 -0.8577782 -0.8086910
glucose  -0.9545523 -0.8577782  1.0000000  0.8608000
inosine  -0.8512938 -0.8086910  0.8608000  1.0000000

To generate a scatterplot matrix, just type:

pairs(data)

In future, please include an example of your data that can be imported into R.

Solution 2

In the following example, I simply split a data frame that contains all of the variables into two matrices. These can be entered into the cor function to obtain your correlation values:

set.seed(1)
n=20
df <- data.frame(tyrosine=runif(n), urea=runif(n), glucose=runif(n), inosine=runif(n))
df

COR <- cor(as.matrix(df[,1]), as.matrix(df[,-1]))
COR
#           urea    glucose    inosine
#[1,] -0.2373854 -0.3672984 -0.3393602

Solution 3

similar to Marc in the box using the apply and column names

> set.seed(1)
> n=20
> df <- data.frame(tyrosine=runif(n), urea=runif(n), glucose=runif(n), 
  inosine=runif(n))

> apply(df,2, function(col)cor(col, df$tyrosine))

tyrosine       urea    glucose    inosine 
1.0000000 -0.2373854 -0.3672984 -0.3393602 

It's a good question, and pattern to know for data of reasonable size, as it's efficient if you only want tyrosine cors (what the OP specifically asked) to only calculate tyrosine cors (n time + space), not all vs all (~n^2 time + space).

Share:
39,106
Admin
Author by

Admin

Updated on July 29, 2022

Comments

  • Admin
    Admin almost 2 years

    I want to correlate one variable (say tyrosine) with all the other variables (about 200 other metabolites, like urea, glucose, inosine, etc) on R, and I'm not sure how to go about it. I'm new to R.

    I've learned the pair function, but that pairs every metabolite in the range specified to the other.

    Thanks!

  • statquant
    statquant over 10 years
    not sure how/why you got -1
  • Admin
    Admin over 10 years
    Sorry, I'm trying to include an example of the data but it makes the comment too long... What I'm asking is that after inosine there are another 100 variables, and I don't want to list each one in the code. Basically, I'm only interested in seeing whether tyrosine covaries with any other variable. Any fast way to do that?
  • jlhoward
    jlhoward over 10 years
    Sure, cor(data$tyrosine,data) does it.
  • gal007
    gal007 over 5 years
    @jlhoward not working anymore: "Error in cor(data$sus, data) : 'y' must be numeric". Do you know a new way?
  • Thomas Rokicki
    Thomas Rokicki over 4 years
    @gal007 Check that all values in your data are numeric. You cannot run this against nominal data. If there are nominal values, create a list of column names you wish to use then just run cor() against that list. numeric_column_names<- c("col1", "col2") then run cor(data$tyrosine, data[numeric_column_names])