Heatmap of Microarray Data using Pearson Distance

12,168

Ok...I think you are simply confused about how cor and dist operate. From the documentation on dist:

This function computes and returns the distance matrix computed by using the specified 
    distance measure to compute the distances between the rows of a data matrix.

And from the documentation on cor:

If x and y are matrices then the covariances (or correlations) 
    between the columns of x and the columns of y are computed.

See the difference? dist (and dist objects, which is what heatmap.2 is assuming it's getting) assume that you've calculated the distance between rows, while using cor you are essentially calculating the distance between columns. Adding a simple transpose to your distance function allows this (non-square) example to run for me:

TEST <- matrix(runif(100),nrow=20)
heatmap.2(t(TEST), trace="none", density="none", 
            scale="row",
            labRow="",
            hclust=function(x) hclust(x,method="complete"),
            distfun=function(x) as.dist((1-cor(t(x)))/2))
Share:
12,168
Ana
Author by

Ana

Updated on June 28, 2022

Comments

  • Ana
    Ana almost 2 years

    I have been trying to generate a heatmap in R for some microarray data and for the most part have been successful in producing one, based on online instruction, but it does not do exactly what I want. What I would like is to cluster data based on Pearson distance, rather than euclidean distance, but I have run into some difficulties.

    Using heatmap2 (from the gplots package) I use the following code to make my initial heat map:

    heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")   [data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"))
    

    Test402 is a matrix with 402 rows (genes) and 31 columns (patients), and data.test.factors are indicators of the outcome group each patient belongs to. Using hclustfun works fine here and the heatmap seems to be responsive to change in method and overall works. The problem is, the clustering distance is all Euclidean distance, I would like to change that to Pearson distance. So I attempt the following:

    heatmap.2(Test402,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-cor(x))/2) )
    

    the above command fails. That is because Test402 needs to be a square matrix. So looking at some additional advice I tried the following:

    cU = cor(Test402)
    heatmap.2(cU,trace="none",density="none",scale="row", ColSideColors=c("red","blue")[data.test.factors],col=redgreen,labRow="",hclustfun=function(x) hclust(x,method="complete"), distfun=function(x) as.dist((1-x)/2) )
    

    That works, BUT here is the problem. The heatmap, rather than having the original expression values in TEST402, now only displays the correlations. This is NOTwhat I want! I want this, and I only want the dendrogram to cluster differently, I don't want to change what data is actually represented in the heatmap! Is this possible?

  • Ana
    Ana almost 13 years
    Joran that was IT! Wow you were completely right, I wasn't focusing on that aspect, I was bent up about other things! Thank you soo much for point out such a foolish mistake! Also thank you for taking the time to help me answer this!
  • joran
    joran almost 13 years
    Simple mistake, maybe, but not foolish. My only clue was that the docs for heatmap.2 said that distfun defaults to dist.