How to avoid overplotting (for points) using base-graph?

21,008

Solution 1

Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:

X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))

plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))

Solution 2

Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.

Solution 3

One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.

For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.

Solution 4

You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Share:
21,008
Henrik
Author by

Henrik

Assistent professor of psychology at the University of Warwick, UK. My primary programming language is R. I am maintainer of R packages afex, MPTinR, and acss. Besides R, I use Python (using PsychoPy) and JavaScript for running experiments and occasionally other languages. A list of my publications can be found on my homepage (usually with possibility to download the papers, data, and analysis scripts).

Updated on May 31, 2020

Comments

  • Henrik
    Henrik almost 4 years

    I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data: alt text

    However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
    Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
    It would be ideal if e.g., the respective points get darker, or thicker or,...

    Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).


    Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!

    This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).