Finding percentage in a sub-group using group_by and summarise

64,826

Solution 1

Try

library(dplyr)
data %>%
    group_by(month) %>%
    mutate(countT= sum(count)) %>%
    group_by(type, add=TRUE) %>%
    mutate(per=paste0(round(100*count/countT,2),'%'))

Or make it more simpler without creating additional columns

data %>%
    group_by(month) %>%
    mutate(per =  100 *count/sum(count)) %>% 
    ungroup

We could also use left_join after summarising the sum(count) by 'month'

Or an option using data.table.

 library(data.table)
 setkey(setDT(data), month)[data[, list(count=sum(count)), month], 
               per:= paste0(round(100*count/i.count,2), '%')][]

Solution 2

And with a bit less code:

df <- data.frame(month=c("Feb-14", "Feb-14", "Feb-14", "Mar-14", "Mar-14", "Mar-14", "Apr-14", "Apr-14", "Apr-14", "May-14"),
             type=c("bbb", "ccc", "aaa", "bbb", "ccc", "aaa", "bbb", "ccc", "aaa", "bbb"),
             count=c(341, 527, 2674, 811, 1045, 4417, 1178, 1192, 4793, 916))


library(dplyr)

df %>% group_by(month) %>% 
       mutate(per=paste0(round(count/sum(count)*100, 2), "%")) %>% 
       ungroup

Since you want to "leave" your data frame untouched you shouldn't use summarise, mutate will suffice.

Solution 3

We can use prop.table to get the proportions within each group.

This can be written in dplyr :

library(dplyr)
df %>% group_by(month) %>% mutate(per= prop.table(count) * 100)

#  month  type  count    per
#   <chr>  <chr> <dbl>  <dbl>
# 1 Feb-14 bbb     341   9.63
# 2 Feb-14 ccc     527  14.9 
# 3 Feb-14 aaa    2674  75.5 
# 4 Mar-14 bbb     811  12.9 
# 5 Mar-14 ccc    1045  16.7 
# 6 Mar-14 aaa    4417  70.4 
# 7 Apr-14 bbb    1178  16.4 
# 8 Apr-14 ccc    1192  16.6 
# 9 Apr-14 aaa    4793  66.9 
#10 May-14 bbb     916 100   

Base R :

df$per <- with(df, ave(count, month, FUN = prop.table) * 100)

and data.table :

library(data.table)
setDT(df)[, per := prop.table(count) * 100, month]
Share:
64,826
KC.
Author by

KC.

Updated on July 18, 2021

Comments

  • KC.
    KC. almost 3 years

    I am new to dplyr and trying to do the following transformation without any luck. I've searched across the internet and I have found examples to do the same in ddply but I'd like to use dplyr.

    I have the following data:

       month   type  count
    1  Feb-14  bbb   341
    2  Feb-14  ccc   527
    3  Feb-14  aaa  2674
    4  Mar-14  bbb   811
    5  Mar-14  ccc  1045
    6  Mar-14  aaa  4417
    7  Apr-14  bbb  1178
    8  Apr-14  ccc  1192
    9  Apr-14  aaa  4793
    10 May-14  bbb   916
    ..    ...  ...   ...
    

    I want to use dplyr to calculate the percentage of each type (aaa, bbb, ccc) at a month level i.e.

       month   type  count  per
    1  Feb-14  bbb   341    9.6%
    2  Feb-14  ccc   527    14.87%
    3  Feb-14  aaa  2674    ..
    ..    ...  ...   ...
    

    I've tried

    data %>%
      group_by(month, type) %>%
      summarise(count / sum(count))
    

    This gives a 1 as each value. How do I make the sum(count) sum across all the types in the month?

  • oatmilkyway
    oatmilkyway over 3 years
    Wow, this was extremely helpful! Thank you!
  • Stefan Jelkovich
    Stefan Jelkovich about 3 years
    Yes, nice hint!