Creating a histogram with multiple data series using multhist in R

18,216

Solution 1

Read the documentation of barplot to understand how to specify zero space:

multhist(year, xlab="Count", ylab="Frequency", main="", 
         cex.axis=1, col=c("dark gray", "light gray"), 
         breaks=seq(0,1600, by=200),
         space=c(0,0), beside=TRUE)

enter image description here

Here is an example with ggplot2 and theme_bw:

library(ggplot2)

ggplot(df, aes(x=count,group=year,fill=as.factor(year))) + 
  geom_histogram(position="identity", alpha=0.5, breaks=seq(0,1600, by=200),right=TRUE) +
  scale_fill_discrete(name="Year") +
  theme_bw(base_size=20) +
  xlab("values")

enter image description here

Or if you really want it like the plot from multhist (which is not as easy to interpret):

ggplot(df, aes(x=count,group=year,fill=as.factor(year))) + 
  geom_histogram(position="dodge", breaks=seq(0,1600, by=200),right=TRUE) +
  scale_fill_discrete(name="Year") +
  theme_bw(base_size=20) +
  xlab("values") +
  scale_x_continuous(breaks=seq(100,1500, by=200))

enter image description here

Solution 2

For superimposed histograms I prefer to use density plots. They're easier on the eyes, especially if you have thinner bins and more cases. With your data, one would get this.

ggplot(df, aes(x=count,group=year,fill=as.factor(year))) + 
  geom_density(position="identity", alpha=0.5, breaks=seq(0,1600, by=200),right=TRUE) +
  scale_fill_discrete(name="Year") +
  theme_bw() +
  xlab("values")

density plot

Share:
18,216
Emily
Author by

Emily

Updated on June 05, 2022

Comments

  • Emily
    Emily almost 2 years

    I want to create a histogram with multiple data series on the same plot. The best method that I can find to do this is multhist(). I would like a plot in a style similar to hist(), and while ggplot() can also be used to perform this task, the graphics style is not what I want.

    Here is some example data:

    df <- structure(list(year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
    2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
    2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 
    2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 
    2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 
    2012L, 2012L, 2012L), count = c(187L, 199L, 560L, 1000L, 850L, 
    400L, 534L, 911L, 390L, 1008L, 1173L, 1222L, 810L, 950L, 752L, 
    1125L, 468L, 710L, 290L, 670L, 855L, 614L, 1300L, 950L, 670L, 
    888L, 490L, 557L, 741L, 700L, 954L, 378L, 512L, 780L, 951L, 398L, 
    1544L, 903L, 769L, 1399L, 1021L, 1235L, 1009L, 1222L, 255L)), .Names = c("year", 
    "count"), class = "data.frame", row.names = c(NA, -45L))
    

    And here is the code that I have used so far:

    require(plotrix)
    d2011<-df$count[df$year=="2011"]
    d2012<-df$count[df$year=="2012"]
    year<-list(d2011,d2012)
    mh <- multhist(year, xlab="Count", ylab="Frequency", main="", cex.axis=1, col=c("dark gray", "light gray"), breaks=seq(0,1600, by=200))
    box(bty="l", col="black")
    legend.text<-c("2011","2012")
    legend(locator(1), legend=legend.text, col=c("dark gray", "light gray"), pch=15, bty="n", cex=0.8)
    

    This provides me with a 'barplot style' multi histogram, but I am having issues changing two graph parameters.

    1. I would like the plot to look more like a histogram and less like a barplot, so firstly I want to remove (or reduce) the space between the columns. I have tried using space = NULL, but this command does not appear to work with multhist

    2. I would like to change the x-axis so that axis tick marks are present in between bars on the plot and axis text is aligned with tick marks rather than positioned at the bar midpoint. I have tried using axis(side=1, …), but as multhist uses list objects to create plots these commands don’t appear to work.

    Any suggestions would be greatly appreciated. Suggestions for other useful graphics packages that can plot histograms with multiple datasets would also be welcomed.