ggplot: boxplot number of observations as x-axis labels

12,102

look at this answer, It is not on the label but it works - I have used this

Modify x-axis labels in each facet

You can also do as follows, I also have used that

    library(ggplot2)
df <- data.frame(group=sample(c("a","b","c"),100,replace=T),x=rnorm(100),y=rnorm(100)*rnorm(100))
xlabs <- paste(levels(df$group),"\n(N=",table(df$group),")",sep="")
ggplot(df,aes(x=group,y=x,color=group))+geom_boxplot()+scale_x_discrete(labels=xlabs)

enter image description here

This also works

library(ggplot2) library(reshape2)

df <- data.frame(group=sample(c("a","b","c"),100,replace=T),x=rnorm(100),y=rnorm(100)*rnorm(100))
df1 <- melt(df)
df2 <- ddply(df1,.(group,variable),transform,N=length(group))
df2$label <- paste0(df2$group,"\n","(n=",df2$N,")")
ggplot(df2,aes(x=label,y=value,color=group))+geom_boxplot()+facet_grid(.~variable)

enter image description here

Share:
12,102
sina
Author by

sina

Updated on June 25, 2022

Comments

  • sina
    sina almost 2 years

    I have successfully created a very nice boxplot (for my purposes) categorized by a factor and binned, according to the answer in my previous post here: ggplot: arranging boxplots of multiple y-variables for each group of a continuous x

    Now, I would like to customize the x-axis labels according to the number of observations in each boxplot.

    require (ggplot2)
    require (plyr)
    library(reshape2)
    
    set.seed(1234)
    x<- rnorm(100)
    y.1<-rnorm(100)
    y.2<-rnorm(100)
    y.3<-rnorm(100)
    y.4<-rnorm(100)
    
    df<- (as.data.frame(cbind(x,y.1,y.2,y.3,y.4)))
    dfmelt<-melt(df, measure.vars = 2:5)
    
    dfmelt$bin <- factor(round_any(dfmelt$x,0.5))
    
    dfmelt.sum<-summary(dfmelt$bin)    
    
    ggplot(dfmelt, aes(x=bin, y=value, fill=variable))+
    geom_boxplot()+
    facet_grid(.~bin, scales="free")+
    labs(x="number of observations")+
    scale_x_discrete(labels= dfmelt.sum)
    

    dfmelt.sum only gives me the total number of observations for each bin not for each boxplot. Boxplots statistics give me the number of observations for each boxplot.

    dfmelt.stat<-boxplot(value~variable+bin, data=dfmelt)
    dfmelt.n<-dfmelt.stat$n
    

    But how do I add tick marks and labels for each boxplot?

    Thanks, Sina

    UPDATE

    I have continued working on this. The biggest problem is that in the code above, only one tick mark is provided per facet. Since I also wanted to plot the means for each boxplot, I have used interaction to plot each boxplot individually, which also adds tick marks on the x-axis for each boxplot:

    require (ggplot2)
    require (plyr)
    library(reshape2)
    
    set.seed(1234) x<- rnorm(100)
    y.1<-rnorm(100)
    y.2<-rnorm(100)
    y.3<-rnorm(100)
    y.4<-rnorm(100)
    
    df<- (as.data.frame(cbind(x,y.1,y.2,y.3,y.4))) dfmelt<-melt(df, measure.vars = 2:5)
    
    dfmelt$bin <- factor(round_any(dfmelt$x,0.5))
    
    dfmelt$f2f1<-interaction(dfmelt$variable,dfmelt$bin)
    
    dfmelt_mean<-aggregate(value~variable*bin, data=dfmelt, FUN=mean)
    dfmelt_mean$f2f1<-interaction(dfmelt_mean$variable, dfmelt_mean$bin)
    
    dfmelt_length<-aggregate(value~variable*bin, data=dfmelt, FUN=length)
    dfmelt_length$f2f1<-interaction(dfmelt_length$variable, dfmelt_length$bin)
    

    On the side: maybe there is a more elegant way to combine all those interactions. I'd be happy to improve.

    ggplot(aes(y = value, x = f2f1, fill=variable), data = dfmelt)+
    geom_boxplot()+
    geom_point(aes(x=f2f1, y=value),data=dfmelt_mean, color="red", shape=3)+
    facet_grid(.~bin, scales="free")+
    labs(x="number of observations")+
    scale_x_discrete(labels=dfmelt_length$value)
    

    This gives me tick marks on for each boxplot which can be potentially labeled. However, using labels in scale_x_discrete only repeats the first four values of dfmelt_length$value in each facet.

    How can that be circumvented? Thanks, Sina