Creating a Pareto Chart with ggplot2 and R

23,367

Solution 1

The bars in ggplot2 are ordered by the ordering of the levels in the factor.

val$State <- with(val, factor(val$State, levels=val[order(-Value), ]$State))

Solution 2

Subsetting and sorting your data;

valact <- subset(val, variable=='actual')
valsort <- valact[ order(-valact[,"Value"]),]

From there it's just a standard boxplot() with a very manual cumulative function on top:

op <- par(mar=c(3,3,3,3)) 
bp <- barplot(valsort [ , "Value"], ylab="", xlab="", ylim=c(0,1),    
              names.arg=as.character(valsort[,"State"]), main="How's that?") 
lines(bp, cumsum(valsort[,"Value"])/sum(valsort[,"Value"]), 
      ylim=c(0,1.05), col='red') 
axis(4)
box() 
par(op)

which should look like this

alt text
(source: eddelbuettel.com)

and it doesn't even need the overplotting trick as lines() happily annotates the initial plot.

Solution 3

A traditional Pareto chart in ggplot2.......

Developed after reading Cano, E. L., Moguerza, J. M., & Redchuk, A. (2012). Six Sigma with R. (G. Robert, K. Hornik, & G. Parmigiani, Eds.) Springer.

library(ggplot2);library(grid)

counts  <- c(80, 27, 66, 94, 33)
defects <- c("price code", "schedule date", "supplier code", "contact num.", "part num.")
dat <- data.frame(count = counts, defect = defects, stringsAsFactors=FALSE )
dat <- dat[order(dat$count, decreasing=TRUE),]
dat$defect <- factor(dat$defect, levels=dat$defect)
dat$cum <- cumsum(dat$count)
count.sum<-sum(dat$count)
dat$cum_perc<-100*dat$cum/count.sum

p1<-ggplot(dat, aes(x=defect, y=cum_perc, group=1))
p1<-p1 + geom_point(aes(colour=defect), size=4) + geom_path()

p1<-p1+ ggtitle('Pareto Chart')+ theme(axis.ticks.x = element_blank(), axis.title.x = element_blank(),axis.text.x = element_blank())
p1<-p1+theme(legend.position="none")

p2<-ggplot(dat, aes(x=defect, y=count,colour=defect, fill=defect))
p2<- p2 + geom_bar()

p2<-p2+theme(legend.position="none")

plot.new()
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
print(p1, vp = viewport(layout.pos.row = 1,layout.pos.col = 1))
print(p2, vp = viewport(layout.pos.row = 2,layout.pos.col = 1))

Solution 4

With a simple example:

 > data
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9    PC10 
0.29056 0.23833 0.11003 0.05549 0.04678 0.03788 0.02770 0.02323 0.02211 0.01925 

barplot(data) does things correctly

the ggplot equivalent "should be": qplot(x=names(data), y=data, geom='bar')

But that incorrectly reorders/sorts the bars alphabetically... because that's how levels(factor(names(data))) would be ordered.

Solution: qplot(x=factor(names(data), levels=names(data)), y=data, geom='bar')

Phew!

Solution 5

Also, see the package qcc which has a function pareto.chart(). Looks like it uses base graphics too, so start your bounty for a ggplot2-solution :-)

Share:
23,367
JD Long
Author by

JD Long

Only slightly ashamed creator of disgusting and frustrating code. I'm a data guy not a programmer. But sometimes I have to program my data into submission.

Updated on July 09, 2022

Comments

  • JD Long
    JD Long almost 2 years

    I have been struggling with how to make a Pareto Chart in R using the ggplot2 package. In many cases when making a bar chart or histogram we want items sorted by the X axis. In a Pareto Chart we want the items ordered descending by the value in the Y axis. Is there a way to get ggplot to plot items ordered by the value in the Y axis? I tried sorting the data frame first but it seems ggplot reorders them.

    Example:

    val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt")
    val<-with(val, val[order(-Value), ])
    p <- ggplot(val)
    p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1")
    

    the data frame val is sorted but the output looks like this:

    alt text
    (source: cerebralmastication.com)

    Hadley correctly pointed out that this produces a much better graphic for showing actuals vs. predicted:

    ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual"))
    

    which returns:

    alt text
    (source: cerebralmastication.com)

    But it's still not a Pareto Chart. Any tips?

  • JD Long
    JD Long over 14 years
    That is awesome! That's exactly what I could not figure out how to do. Thank you!
  • JD Long
    JD Long over 14 years
    I accepted Chang's answer because I really wanted to do this with ggplot. But I still owe you a beer for giving such a kick ass answer.
  • hadley
    hadley over 14 years
    Or a little more succinctly, change your first aes call to: ` aes(reorder(State, Value), Value)`
  • Andreas
    Andreas over 14 years
    I think you need aes(reorder(State, Value, mean), Value) - since there are two values for each state?
  • JD Long
    JD Long over 14 years
    you gave a far more through answer to the Perato part than I was expecting! My question was grossly stylized and I had coded myself into a corner where using ggplot2 was the easiest way out. What you did with base graphics was really cool. Thanks again.
  • d_a_c321
    d_a_c321 over 10 years
    @DirkEddelbuettel -- as a crazy followup, I was wondering if you could modify your answer so that it accepts a facet_wrap?