How to plot density curves for each column in R?

r ggplot2 dataframe

18,495

Solution 1

Use "melt" from the "reshape" package (you could also use the base reshape function, but it's a more complicated call).

require (reshape)
require (ggplot2)
long = melt(w, id.vars= "refseq")

ggplot(long, aes (value)) +
    geom_density(color = variable)

# or maybe you wanted separate plots on the same page?

ggplot(long, aes (value)) +
    geom_density() +
    facet_wrap(~variable)

There are lots of other ways to plot this in ggplot: see http://docs.ggplot2.org/0.9.3.1/geom_histogram.html for examples.

Solution 2

ggplot needs your data in a long format, like so:

variable  value
1 V1  0.24468840
2 V1  0.00000000
3 V1  8.42938930
4 V2  0.31737190

Once it's melted into a long data frame, you can group all the density plots by variable. In the snippet below, ggplot uses the w.plot data frame for plotting (which doesn't need to omit the final refseq variable). You can modify it to use facets, different colors, fills, etc.

w <- as.data.frame(cbind(
  c(0.2446884, 0.0000000, 8.4293893), 
  c(0.3173719, 0.0000000, 4.9985040), 
  c(0.74258410, 0.08592243, 2.22526463)))
w$refseq <- c("NM_000014", "NM_000015", "NM_000016")

library(ggplot2)
library(reshape2)
w.plot <- melt(w) 

p <- ggplot(aes(x=value, colour=variable), data=w.plot)
p + geom_density()

Example plot

Solution 3

Here's a solution using the plot function and a little loop

Call your plot

plot(density(df[,1]), type = "n")

then run this to add the lines

n = dim(df)[2]-1
for(i in 1:n){
lines(density(c(df[,i])))
}

18,495

Author by

Hanfei Sun

Just another stackoverflow user cs.cmu.edu/~hanfeis

Updated on June 05, 2022

Comments

Hanfei Sun about 2 years

I have a data frame w like this:

>head(w,3)
         V1        V2         V3        V4 V5        V6         V7        V8        V9       V10 V11        V12        V13        V14
1 0.2446884 0.3173719 0.74258410 0.0000000  0 0.0000000 0.01962759 0.0000000 0.0000000 0.5995647   0 0.30201691 0.03109935 0.16897571
2 0.0000000 0.0000000 0.08592243 0.2254971  0 0.7381867 0.11936323 0.2076167 0.0000000 1.0587742   0 0.50226734 0.51295661 0.01298853
3 8.4293893 4.9985040 2.22526463 0.0000000  0 3.6600283 0.00000000 0.0000000 0.2573714 0.8069288   0 0.05074886 0.00000000 0.59403855
         V15       V16      V17       V18      V19       V20       V21      V22         V23        V24       V25       V26       V27
1 0.00000000 0.0000000 0.000000 0.1250837 0.000000 0.5468143 0.3503245 0.000000 0.183144204 0.23026538 6.9868429 1.5774150 0.0000000
2 0.01732732 0.8064441 0.000000 0.0000000 0.000000 0.0000000 0.0000000 0.000000 0.015123385 0.07580794 0.6160713 0.7452335 0.0740328
3 2.66846151 0.0000000 1.453987 0.0000000 1.875298 0.0000000 0.0000000 0.893363 0.004249061 0.00000000 1.6185897 0.0000000 0.7792773
        V28 V29     V30       V31        V32        V33       V34       V35 V36        V37        V38       V39        V40    refseq
1 0.5543028   0 0.00000 0.0000000 0.08293075 0.18261450 0.3211127 0.2765295   0 0.04230929 0.05017316 0.3340662 0.00000000 NM_000014
2 0.0000000   0 0.00000 0.0000000 0.00000000 0.03531411 0.0000000 0.4143325   0 0.14894716 0.58056304 0.3310173 0.09162460 NM_000015
3 0.8047882   0 0.88308 0.7207709 0.01574767 0.00000000 0.0000000 0.1183736   0 0.00000000 0.00000000 1.3529881 0.03720155 NM_000016

dim(w)
[1] 37126    41

I tried to plot the density curve of each column(except the last column) in one page. It seems that ggplot2 can do this.

I tried this according to this post:

ggplot(data=w[,-41], aes_string(x=colnames)) + geom_density()

But it doesn't work by complaining like this:

Error in as.character(x) : 
  cannot coerce type 'closure' to vector of type 'character'

And I'm not sure how to convert the format of this dataframe to the one ggplot2 accepts. Or is there other way to do this job in R?

Chase about 11 years

You need to melt() your data into long format, the question here shows you how to do this: stackoverflow.com/questions/5479822/…

kira over 8 years

what exactly is the "refseq" which you are using as id here?
janattack about 8 years

@kira It's the 41st column in OP's data set, the only one that's not numbers (it looks to be gene accession numbers from the NCBI Reference Sequence Database ncbi.nlm.nih.gov/refseq, but it would be a pretty good guess as an id column anyway)
Sander W. van der Laan about 6 years

I have a similar issue. In my case I have 1000 columns and no header-names (so in R they're called 'X1', 'X2', etc). If I do melt(df) I get an error Using X404, X755, X974 as id variables Error in match.names(clabs, names(xi)) : names do not match previous names. How should I interpret that?
Sander W. van der Laan about 6 years

Ah! I use read_table2 to load in the data - very fast solution. But it turns the data in a tibble, rather than a data.frame. So if I convert it to a data.frame it works!