How to add boxplots to scatterplot with jitter

70,527

Solution 1

Here's one way using base graphics.

boxplot(NUMS ~ GRP, data = ddf, lwd = 2, ylab = 'NUMS')
stripchart(NUMS ~ GRP, vertical = TRUE, data = ddf, 
    method = "jitter", add = TRUE, pch = 20, col = 'blue')

enter image description here

Solution 2

To do this in ggplot2, try:

ggplot(ddf, aes(x=GRP, y=NUMS)) + 
  geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
  geom_jitter(position=position_jitter(width=.1, height=0))

ggplot2 version of boxplot + jitter

Obviously you can adjust the width and height arguments of position_jitter() to your liking (although I'd recommend height=0 since height jittering will make your plot inaccurate).

Solution 3

I've written an R function called spreadPoints() within a package basiclotteR. The package can be directly installed into your R library using the following code:

install.packages("devtools")
library("devtools")
install_github("JosephCrispell/basicPlotteR")

For the example provided, I used the following code to generate the example figure below.

ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5],500,replace=T))

boxplot(NUMS ~ GRP, data = ddf, lwd = 2, ylab = 'NUMS')

spreadPointsMultiple(data=ddf, responseColumn="NUMS", categoriesColumn="GRP",
                     col="blue", plotOutliers=TRUE)

enter image description here

It is a work in progress (the lack of formula as input is clunky!) but it provides a non-random method to spread points on the X axis that doubles as a violin like summary of the data. Take a look at the source code, if you're interested.

Solution 4

For a lattice solution:

library(lattice)
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5], 500, replace = T))
bwplot(NUMS ~ GRP, ddf, panel = function(...) {
  panel.bwplot(..., pch = "|")
  panel.xyplot(..., jitter.x = TRUE)})

The default median dot symbol was changed to a line with pch = "|". Other properties of the box and whiskers can be adjusted with box.umbrella and box.rectangle through the trellis.par.set() function. The amount of jitter can be adjusted through a variable named factor where factor = 1.5 increases it by 50%.

lattice solution to boxplot with scatter

Share:
70,527

Related videos on Youtube

rnso
Author by

rnso

.

Updated on August 28, 2020

Comments

  • rnso
    rnso over 3 years

    I am using following commands to produce a scatterplot with jitter:

    ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5],500,replace=T))
    library(lattice)
    stripplot(NUMS~GRP,data=ddf, jitter.data=T)
    

    I want to add boxplots over these points (one for every group). I tried searching but I am not able to find code plotting all points (and not just outliers) and with jitter. How can I solve this. Thanks for your help.

    • lukeA
      lukeA almost 10 years
      Does it have to be lattice? Otherwise try sth like with(ddf, { boxplot(NUMS~GRP); points(jitter(as.numeric(GRP)), NUMS, col=rgb(0,0,0,.2), cex=.5, pch=19) }).
    • rnso
      rnso almost 10 years
      Using base graphics is preferred. Your solution works very well. Thanks.
    • rnso
      rnso almost 10 years
      Can this be done with ggplot2? I tried {ggplot(ddf,aes(x=GRP, y=NUMS))+geom_boxplot()+geom_jitter()} but it produces too much scatter- could the jitter be less?
    • Brian D
      Brian D almost 8 years
      See this related question as well for points jittered by group: stackoverflow.com/questions/21468380/…
  • rnso
    rnso almost 10 years
    Yes, it works very well. Thanks. I was trying stripplot followed by boxplot and it was not working.
  • Rich Scriven
    Rich Scriven almost 10 years
    The add = TRUE argument is key. :)
  • rnso
    rnso almost 10 years
    add=T alone may not be enough since {stripplot(NUMS~GRP,data=ddf, jitter=T) ; boxplot(NUMS~GRP,data=ddf, add=T)} does not work; apparently one needs to put a 'plot' first followed by points or chart.
  • Rich Scriven
    Rich Scriven almost 10 years
    stripplot is in lattice. stripchart is a base graphics function.
  • rnso
    rnso about 5 years
    Looks good. Is it possible to plot all groups with just one line of code rather than repeating code for each group: spreadPoints(ddf[ddf$GRP=="A", "NUMS"], position=1, col="blue", plotOutliers=TRUE) ?
  • Joseph Crispell
    Joseph Crispell about 5 years
    @rnso I've created an additional function spreadPointsMultiple() that can spread the points for multiple boxplots with a single command (see edit above). I'm currently working on allowing spreadPoints() to have a formula as its first argument. Thanks for pointing this out :-)