Heatmap or plot for a correlation matrix

21,297

Solution 1

@Lucas provides good advice here as corrplot is quite useful for visualizing correlation matrices. However, it doesn't address the original issue of plotting a large correlation matrix. In fact, corrplot will also fail when trying to visualize this large of a correlation matrix. For a simple solution, you might want to consider reducing the number of variables. That is, I would suggest looking at the correlation between a subset of variables that you know are important for your problem. Trying to understand the correlation structure of that many variables will be a difficult task (even if you can visualize it)!

Solution 2

If you want to do a correlation plot, use the corrplot library as it has a lot of flexibility to create heatmap-like figures for correlations

library(corrplot)
#create data with some correlation structure
jnk=runif(1000)
jnk=(jnk*100)+c(1:500, 500:1)
jnk=matrix(jnk,nrow=100,ncol=10)
jnk=as.data.frame(jnk)
names(jnk)=c("var1", "var2","var3","var4","var5","var6","var7","var8","var9","var10")

#create correlation matrix
cor_jnk=cor(jnk, use="complete.obs")
#plot cor matrix
corrplot(cor_jnk, order="AOE", method="circle", tl.pos="lt", type="upper",        
tl.col="black", tl.cex=0.6, tl.srt=45, 
         addCoef.col="black", addCoefasPercent = TRUE,
         p.mat = 1-abs(cor_jnk), sig.level=0.50, insig = "blank")  

enter image description here The code above only adds color to the correlations that have > abs(0.5) correlation, but you can easily change that. Lastly, there are many ways that you can configure the look of the plot as well (change the color gradient, display of correlations, display of full vs only half matrix, etc.). The order argument is particularly useful as it allows you to order your variables in the correlation matrix based on PCA, so they are ordered based on similarities in correlation.

For squares for instance (similar to your original plot)- just change the method to squares: enter image description here

EDIT: @Carson. You can still use this method for reasonable large correlation matrices: for instance a 100 variable matrix below. Beyond that, I fail to see what is the use of making a graphical representation of a correlation matrix with so many variables without some subsetting, as that will be very hard to interpret. enter image description here

Share:
21,297
user2258452
Author by

user2258452

Updated on July 09, 2022

Comments

  • user2258452
    user2258452 almost 2 years

    I tried to make a plot out of the correlation matrix and having three colours to represent the correlation coefficients using the library lattice.

    library(lattice)
    
    levelplot(cor)
    

    I obtain the following plot:

    Plot of correlation matrix

    The plot is only for a subset of the data I had. When I use the whole dataset( 400X400) then it becomes unclear and the colouring is not shown properly and is shown as dots. Is it possible to obtain the same in tile form for a large matrix?

    I tried using the pheatmap function but I do not want my values to be clustered and just want a representaion of high and low values clearly in a tile form.

  • smci
    smci over 6 years
    How many variables is 'large', please?