How to plot density curves for each column in R?
Solution 1
Use "melt" from the "reshape" package (you could also use the base reshape function, but it's a more complicated call).
require (reshape)
require (ggplot2)
long = melt(w, id.vars= "refseq")
ggplot(long, aes (value)) +
geom_density(color = variable)
# or maybe you wanted separate plots on the same page?
ggplot(long, aes (value)) +
geom_density() +
facet_wrap(~variable)
There are lots of other ways to plot this in ggplot: see http://docs.ggplot2.org/0.9.3.1/geom_histogram.html for examples.
Solution 2
ggplot
needs your data in a long format, like so:
variable value
1 V1 0.24468840
2 V1 0.00000000
3 V1 8.42938930
4 V2 0.31737190
Once it's melted into a long data frame, you can group all the density plots by variable. In the snippet below, ggplot
uses the w.plot
data frame for plotting (which doesn't need to omit the final refseq
variable). You can modify it to use facets, different colors, fills, etc.
w <- as.data.frame(cbind(
c(0.2446884, 0.0000000, 8.4293893),
c(0.3173719, 0.0000000, 4.9985040),
c(0.74258410, 0.08592243, 2.22526463)))
w$refseq <- c("NM_000014", "NM_000015", "NM_000016")
library(ggplot2)
library(reshape2)
w.plot <- melt(w)
p <- ggplot(aes(x=value, colour=variable), data=w.plot)
p + geom_density()
Solution 3
Here's a solution using the plot
function and a little loop
Call your plot
plot(density(df[,1]), type = "n")
then run this to add the lines
n = dim(df)[2]-1
for(i in 1:n){
lines(density(c(df[,i])))
}
Comments
-
Hanfei Sun about 2 years
I have a data frame
w
like this:>head(w,3) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 1 0.2446884 0.3173719 0.74258410 0.0000000 0 0.0000000 0.01962759 0.0000000 0.0000000 0.5995647 0 0.30201691 0.03109935 0.16897571 2 0.0000000 0.0000000 0.08592243 0.2254971 0 0.7381867 0.11936323 0.2076167 0.0000000 1.0587742 0 0.50226734 0.51295661 0.01298853 3 8.4293893 4.9985040 2.22526463 0.0000000 0 3.6600283 0.00000000 0.0000000 0.2573714 0.8069288 0 0.05074886 0.00000000 0.59403855 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 1 0.00000000 0.0000000 0.000000 0.1250837 0.000000 0.5468143 0.3503245 0.000000 0.183144204 0.23026538 6.9868429 1.5774150 0.0000000 2 0.01732732 0.8064441 0.000000 0.0000000 0.000000 0.0000000 0.0000000 0.000000 0.015123385 0.07580794 0.6160713 0.7452335 0.0740328 3 2.66846151 0.0000000 1.453987 0.0000000 1.875298 0.0000000 0.0000000 0.893363 0.004249061 0.00000000 1.6185897 0.0000000 0.7792773 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40 refseq 1 0.5543028 0 0.00000 0.0000000 0.08293075 0.18261450 0.3211127 0.2765295 0 0.04230929 0.05017316 0.3340662 0.00000000 NM_000014 2 0.0000000 0 0.00000 0.0000000 0.00000000 0.03531411 0.0000000 0.4143325 0 0.14894716 0.58056304 0.3310173 0.09162460 NM_000015 3 0.8047882 0 0.88308 0.7207709 0.01574767 0.00000000 0.0000000 0.1183736 0 0.00000000 0.00000000 1.3529881 0.03720155 NM_000016 dim(w) [1] 37126 41
I tried to plot the density curve of each column(except the last column) in one page. It seems that ggplot2 can do this.
I tried this according to this post:
ggplot(data=w[,-41], aes_string(x=colnames)) + geom_density()
But it doesn't work by complaining like this:
Error in as.character(x) : cannot coerce type 'closure' to vector of type 'character'
And I'm not sure how to convert the format of this dataframe to the one ggplot2 accepts. Or is there other way to do this job in R?
-
Chase about 11 yearsYou need to
melt()
your data into long format, the question here shows you how to do this: stackoverflow.com/questions/5479822/…
-
-
kira over 8 yearswhat exactly is the "refseq" which you are using as id here?
-
janattack about 8 years@kira It's the 41st column in OP's data set, the only one that's not numbers (it looks to be gene accession numbers from the NCBI Reference Sequence Database ncbi.nlm.nih.gov/refseq, but it would be a pretty good guess as an id column anyway)
-
Sander W. van der Laan about 6 yearsI have a similar issue. In my case I have 1000 columns and no header-names (so in R they're called 'X1', 'X2', etc). If I do
melt(df)
I get an errorUsing X404, X755, X974 as id variables Error in match.names(clabs, names(xi)) : names do not match previous names
. How should I interpret that? -
Sander W. van der Laan about 6 yearsAh! I use
read_table2
to load in the data - very fast solution. But it turns the data in atibble
, rather than adata.frame
. So if I convert it to adata.frame
it works!