How to plot dendrograms with large datasets?

28,359

Solution 1

The cut function described in the other answer is a very good solution; if you would like to maintain the whole tree on one page for some interactive investigation you could also plot to a large page on a PDF.

The resulting PDF is vectorized so you can zoom in closely with your favourite PDF viewer without loss of resolution.

Here's an example of how to direct plot output to PDF:

# Open a PDF for plotting; units are inches by default
pdf("/path/to/a/pdf/file.pdf", width=40, height=15)

# Do some plotting
plot(gcPhylo)

# Close the PDF file's associated graphics device (necessary to finalize the output)
dev.off()

Solution 2

It is possible to cut a dendrogram at a specified height and plot the elements:

First create a clustering using the built-in dataset USArrests. Then convert to a dendrogram:

hc <- hclust(dist(USArrests))
hcd <- as.dendrogram(hc)

Next, use cut.dendrogram to cut at a specified height, in this case h=75. This produces a list of a dendrogram for the upper bit of the cut, and a list of dendograms, one for each branch below the cut:

par(mfrow=c(3,1))

plot(hcd, main="Main")
plot(cut(hcd, h=75)$upper, 
     main="Upper tree of cut at h=75")
plot(cut(hcd, h=75)$lower[[2]], 
     main="Second branch of lower tree with cut at h=75")

enter image description here

Share:
28,359

Related videos on Youtube

Burcu
Author by

Burcu

PhD Student

Updated on July 01, 2020

Comments

  • Burcu
    Burcu almost 4 years

    I am using ape (Analysis of Phylogenetics and Evolution) package in R that has dendrogram drawing functionality. I use following commands to read the data in Newick format, and draw a dendrogram using the plot function:

    library("ape")
    gcPhylo <-read.tree(file = "gc.tree")
    plot(gcPhylo, show.node.label = TRUE)
    

    As the data set is quite large, it is impossible to see any details in the lower levels of the tree. I can see just black areas but no details. I can only see few levels from the top, and then no detail.

    I was wondering if there is any zoom capability of the plot function. I tried to limit the area using xLim and yLim, however, they just limit the area, and do not zoom to make the details visible. Either zooming, or making the details visible without zooming will solve my problem.

    I am also appreciated to know any other package, function, or tool that will help me overcoming the problem.

    Thanks.

    • Ben Bolker
      Ben Bolker over 12 years
      You're likely to have more success with this question on the [email protected] mailing list, which is where the phylogenetics experts who use R hang out ... Have you looked at ?zoom in the ape package ... ??
    • Burcu
      Burcu over 12 years
      Thank you for the mailing list suggestion. I will raise the question if I cannot manage to solve with the answers already given here. I tried zoom function in ape, but still lower nodes were not clear. I will try also subtreeplot() that I came across while searching for zoom in ape.
  • Burcu
    Burcu over 12 years
    Yes, brilliant! That would solve my problem. However, is it possible to convert the phylogenetic tree to a dendrogram? What I have is only a tree which is the output of my java program. So is it possible to define a dendrogram out of my output rather than using the hclust object? Sorry, this is my first interaction with R.
  • MatthewS
    MatthewS over 12 years
    It certainly is possible: you just need to call as.hclust on the output of read.tree (which is an R object of class phylo) to convert it to an object of class hclust.
  • skurp
    skurp over 6 years
    Is it possible to cut a dendrogram at a specific number of clusters instead of specific height? I can't figure out how to plot the dendrogram k = 10 clusters.
  • RDRR
    RDRR over 4 years
    @skurp Yes you can! cut <- cutree(hc, k = 10); plot(hc, hang = -1); rect.hclust(hc, k = 10)