Grouped bar plot in ggplot

194,902

EDIT: Many years later

For a pure ggplot2 + utils::stack() solution, see the answer by @markus!


A somewhat verbose tidyverse solution, with all non-base packages explicitly stated so that you know where each function comes from:

library(magrittr) # needed for %>% if dplyr is not attached

"http://pastebin.com/raw.php?i=L8cEKcxS" %>%
  utils::read.csv(sep = ",") %>%
  tidyr::pivot_longer(cols = c(Food, Music, People.1),
                      names_to = "variable",
                      values_to = "value") %>%
  dplyr::group_by(variable, value) %>%
  dplyr::summarise(n = dplyr::n()) %>%
  dplyr::mutate(value = factor(
    value,
    levels = c("Very Bad", "Bad", "Good", "Very Good"))
  ) %>%
  ggplot2::ggplot(ggplot2::aes(variable, n)) +
  ggplot2::geom_bar(ggplot2::aes(fill = value),
                    position = "dodge",
                    stat = "identity")

The original answer:

First you need to get the counts for each category, i.e. how many Bads and Goods and so on are there for each group (Food, Music, People). This would be done like so:

raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)

raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it

freq=table(col(raw), as.matrix(raw)) # get the counts of each factor level

Then you need to create a data frame out of it, melt it and plot it:

Names=c("Food","Music","People")     # create list of names
data=data.frame(cbind(freq),Names)   # combine them into a data frame
data=data[,c(5,3,1,2,4)]             # sort columns

# melt the data frame for plotting
data.m <- melt(data, id.vars='Names')

# plot everything
ggplot(data.m, aes(Names, value)) +   
  geom_bar(aes(fill = variable), position = "dodge", stat="identity")

Is this what you're after?

enter image description here

To clarify a little bit, in ggplot multiple grouping bar you had a data frame that looked like this:

> head(df)
  ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE
1  1    A  1980   450   338   154    36    13     9
2  2    A  2000   288   407   212    54    16    23
3  3    A  2020   196   434   246    68    19    36
4  4    B  1980   111   326   441    90    21    11
5  5    B  2000    63   298   443   133    42    21
6  6    B  2020    36   257   462   162    55    30

Since you have numerical values in columns 4-9, which would later be plotted on the y axis, this can be easily transformed with reshape and plotted.

For our current data set, we needed something similar, so we used freq=table(col(raw), as.matrix(raw)) to get this:

> data
   Names Very.Bad Bad Good Very.Good
1   Food        7   6    5         2
2  Music        5   5    7         3
3 People        6   3    7         4

Just imagine you have Very.Bad, Bad, Good and so on instead of X1PCE, X2PCE, X3PCE. See the similarity? But we needed to create such structure first. Hence the freq=table(col(raw), as.matrix(raw)).

Share:
194,902
S12000
Author by

S12000

Updated on July 21, 2022

Comments

  • S12000
    S12000 almost 2 years

    I have a survey file in which row are observation and column question.

    Here are some fake data they look like:

    People,Food,Music,People
    P1,Very Bad,Bad,Good
    P2,Good,Good,Very Bad
    P3,Good,Bad,Good
    P4,Good,Very Bad,Very Good
    P5,Bad,Good,Very Good
    P6,Bad,Good,Very Good
    

    My aim is to create this kind of plot with ggplot2.

    • I absolutely don't care of the colors, design, etc.
    • The plot doesn't correspond to the fake data

    enter image description here

    Here are my fake data:

    raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
    raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
    raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
    raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
    

    But if I choose Y as count then I'm facing an issue about choosing the X and the Group values... I don't know if I can succeed without using reshape2... I've also tired to use reshape with melt function. But I don't understand how to use it...

  • S12000
    S12000 over 10 years
    Hello thank you is exactely what I want. Thanks. I just have a question is it also possible to avoid ' raw=raw[,c(2,3,4)] freq=table(col(raw), as.matrix(raw)) and do everything with reshape? Because I had the same kind of issue stackoverflow.com/questions/17303573/… and in this post I only used reshape. I'm confused about it...
  • jakub
    jakub over 10 years
    Well, I'm not sure. The raw=raw[,c(2,3,4)] is there only because it has no sense to include the observation indicator (as you do not plot individual observations in the subsequent plot). Therefore, the counts is the only thing that matters. Whether you can do it all with reshape, I don't know. My guess is that you can't.
  • jakub
    jakub over 10 years
    Well, actually, the data in this current post is different in that it does not contain the numerical counts. Have a look at the columns 4-9 in the data frame from the post you are linking to: they contain numerical values, melted subsequently by Didzis to create the value variable in melted data frame. We did not have any values, so we needed to create them first. Hence freq=table(col(raw), as.matrix(raw)). (I added more extensive explanation at the end of my answer).
  • S12000
    S12000 over 10 years
    Ah true. I got it. Thanks Basically with categorical data like in this post there is one more step... Thanks for your very good explanation.
  • S12000
    S12000 over 10 years
    Sorry to disturb again, I have another question, do you know if is it possible to display the frequency (or percentage) on each bar ?
  • jakub
    jakub over 10 years
    Thanks ;-) Maybe you can find your answer here
  • jakub
    jakub over 10 years
  • S12000
    S12000 over 10 years
    sorry I succeeder for "simple" barplot but I cant figure out how to do it with multiple barplot when "melt" is used. Thanks
  • jakub
    jakub over 10 years
    What would the percentage be equal to? Would it be for instance equal to (number of "Very good" in "Food") / (total number of answers in "Food") ?
  • jakub
    jakub over 10 years
    Anyway, the most straightforward way I can think of is to count the percentage beforehand on your melted data frame (especially in this case, where you already have frequencies of answers). For instance: ddply(data.m, .(Names), summarize, ratio=value/sum(value)) will calculate the percentage I mentioned in my previous comment. Then you can use something like geom_text(aes(label = sprintf("%1.2f%%", 100*ratio),x = variable,y = value),position = position_dodge(width = 0.8), vjust=-.6) to display those in the plot.