R: ggplot stacked bar chart with counts on y axis but percentage as label
Solution 1
As @Gregor mentioned, summarize the data separately and then feed the data summary to ggplot. In the code below, we use dplyr
to create the summary on the fly:
library(dplyr)
ggplot(df %>% count(region, species) %>% # Group by region and species, then count number in each group
mutate(pct=n/sum(n), # Calculate percent within each region
ypos = cumsum(n) - 0.5*n), # Calculate label positions
aes(region, n, fill=species)) +
geom_bar(stat="identity") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=ypos))
Update: With dplyr
0.5 and later, you no longer need to provide a y-value to center the text within each bar. Instead you can use position_stack(vjust=0.5)
:
ggplot(df %>% count(region, species) %>% # Group by region and species, then count number in each group
mutate(pct=n/sum(n)), # Calculate percent within each region
aes(region, n, fill=species)) +
geom_bar(stat="identity") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")),
position=position_stack(vjust=0.5))
Solution 2
I agree with Johanna. You could try:
d <- aggregate(.~region+species, df, length)
d$percent <- paste(round(ID/sum(ID)*100),'%',sep='')
ggplot(d, aes(region, ID, fill=species)) + geom_bar(stat='identity') +
geom_text(position='stack', aes(label=paste(round(ID/sum(ID)*100),'%',sep='')), vjust=5)
Johanna
Updated on July 06, 2022Comments
-
Johanna almost 2 years
I'm looking for a way to label a stacked bar chart with percentages while the y-axis shows the original count (using ggplot). Here is a MWE for the plot without labels:
library(ggplot2) df <- as.data.frame(matrix(nrow = 7, ncol= 3, data = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7", "north", "north", "north", "north", "south", "south", "south", "A", "B", "B", "C", "A", "A", "C"), byrow = FALSE)) colnames(df) <- c("ID", "region", "species") p <- ggplot(df, aes(x = region, fill = species)) p + geom_bar()
I have a much larger table and R counts quite nicely the different species for every region. Now, I would like to show both, the original count value (preferably on the y-axis) and the percentage (as label) to compare proportions of species between regions.
I tried out many things using
geom_text()
but I think the main difference to other questions (e.g. this one) is that- I do not have a separate column for y values (they are just the counts of different species per region) and
- I need the labels per region to sum up to 100% (since they are considered to represent seperate populations), not all labels of the entire plot.
Any help is much appreciated!!
-
Johanna almost 8 yearsThanks a lot, this is exactly what I was looking for!
-
Johanna almost 8 yearsThanks for you input, but in your solution the percentages per stack do not sum up to 100%. BTW: I guess it should be
d$percent <- paste(round(d$ID/sum(d$ID)*100),'%',sep='')
. -
J_F over 6 yearsNote that the code presented above will NOT produce the barplot shown! You have to use a
group_by
command in addition to that:df %>% group_by(region) %>% count(region, species) %>% mutate(pct=n/sum(n)
-
eipi10 over 6 years
group_by
is unnecessary.count(x,y)
is the equivalent ofgroup_by(x,y) %>% tally
.