Correlation between multiple variables of a data frame

28,215

Solution 1

My package corrr, which helps to explore correlations, has a simple solution for this. I'll use the mtcars data set as an example, and say we want to focus on the correlation of mpg with all other variables.

install.packages("corrr")  # though keep eye out for new version coming soon
library(corrr)
mtcars %>% correlate() %>% focus(mpg)


#>    rowname        mpg
#>      <chr>      <dbl>
#> 1      cyl -0.8521620
#> 2     disp -0.8475514
#> 3       hp -0.7761684
#> 4     drat  0.6811719
#> 5       wt -0.8676594
#> 6     qsec  0.4186840
#> 7       vs  0.6640389
#> 8       am  0.5998324
#> 9     gear  0.4802848
#> 10    carb -0.5509251

Here, correlate() produces a correlation data frame, and focus() lets you focus on the correlations of certain variables with all others.

FYI, focus() works similarly to select() from the dplyr package, except that it alters rows as well as columns. So if you're familiar with select(), you should find it easy to use focus(). E.g.:

mtcars %>% correlate() %>% focus(mpg:drat)

#>   rowname        mpg        cyl       disp         hp        drat
#>     <chr>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
#> 1      wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
#> 2    qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
#> 3      vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
#> 4      am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
#> 5    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
#> 6    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980

Solution 2

I think better still, you could get the correlation, not just mapped one variable to all but all variables mapped to all others. You can do that easily with just one line of code. Using the pre-installed mtcars datasets.

library(dplyr)

cor(select(mtcars, mpg, wt, disp, drat, qsec, hp ))
Share:
28,215
Milind Kumar
Author by

Milind Kumar

Updated on August 09, 2022

Comments

  • Milind Kumar
    Milind Kumar over 1 year

    I have a data.frame of 10 Variables in R. Lets call them var1 var2...var10

    I want to find correlation of one of var1 with respect to var2, var3 ... var10

    How can we do that?

    cor function can find correlation between 2 variables at a time. By using that I had to write cor function for each Analysis

  • Luis
    Luis over 5 years
    Hello, Do you know how to display the p-values from focus() ?
  • Simon Jackson
    Simon Jackson over 5 years
    @Luis corrr does not compute p-values, but the question has come up before (eg this issue: github.com/drsimonj/corrr/issues/44).