Extend contigency table with proportions (percentages)
Solution 1
If it's conciseness you're after, you might like:
prop.table(table(tips$smoker))
and then scale by 100 and round if you like. Or more like your exact output:
tbl <- table(tips$smoker)
cbind(tbl,prop.table(tbl))
If you wanted to do this for multiple columns, there are lots of different directions you could go depending on what your tastes tell you is clean looking output, but here's one option:
tblFun <- function(x){
tbl <- table(x)
res <- cbind(tbl,round(prop.table(tbl)*100,2))
colnames(res) <- c('Count','Percentage')
res
}
do.call(rbind,lapply(tips[3:6],tblFun))
Count Percentage
Female 87 35.66
Male 157 64.34
No 151 61.89
Yes 93 38.11
Fri 19 7.79
Sat 87 35.66
Sun 76 31.15
Thur 62 25.41
Dinner 176 72.13
Lunch 68 27.87
If you don't like stack the different tables on top of each other, you can ditch the do.call
and leave them in a list.
Solution 2
Your code doesn't seem so ugly to me...
however, an alternative (not much better) could be e.g. :
df <- data.frame(table(yn))
colnames(df) <- c('Smoker','Freq')
df$Perc <- df$Freq / sum(df$Freq) * 100
------------------
Smoker Freq Perc
1 No 19 47.5
2 Yes 21 52.5
Solution 3
I am not 100% certain, but I think this does what you want using prop.table. See mostly the last 3 lines. The rest of the code is just creating fake data.
set.seed(1234)
total_bill <- rnorm(50, 25, 3)
tip <- 0.15 * total_bill + rnorm(50, 0, 1)
sex <- rbinom(50, 1, 0.5)
smoker <- rbinom(50, 1, 0.3)
day <- ceiling(runif(50, 0,7))
time <- ceiling(runif(50, 0,3))
size <- 1 + rpois(50, 2)
my.data <- as.data.frame(cbind(total_bill, tip, sex, smoker, day, time, size))
my.data
my.table <- table(my.data$smoker)
my.prop <- prop.table(my.table)
cbind(my.table, my.prop)
Solution 4
Here's a tidyverse version:
library(tidyverse)
data(diamonds)
(as.data.frame(table(diamonds$cut)) %>% rename(Count=1,Freq=2) %>% mutate(Perc=100*Freq/sum(Freq)))
Or if you want a handy function:
getPercentages <- function(df, colName) {
df.cnt <- df %>% select({{colName}}) %>%
table() %>%
as.data.frame() %>%
rename({{colName}} :=1, Freq=2) %>%
mutate(Perc=100*Freq/sum(Freq))
}
Now you can do:
diamonds %>% getPercentages(cut)
or this:
df=diamonds %>% group_by(cut) %>% group_modify(~.x %>% getPercentages(clarity))
ggplot(df,aes(x=clarity,y=Perc))+geom_col()+facet_wrap(~cut)
Comments
-
userJT almost 4 years
I have a contingency table of counts, and I want to extend it with corresponding proportions of each group.
Some sample data (
tips
data set fromggplot2
package):library(ggplot2) head(tips, 3) # total_bill tip sex smoker day time size # 1 17 1.0 Female No Sun Dinner 2 # 2 10 1.7 Male No Sun Dinner 3 # 3 21 3.5 Male No Sun Dinner 3
First, use
table
to count smoker vs non-smoker, andnrow
to count total number of subjects:table(tips$smoker) # No Yes # 151 93 nrow(tips) # [1] 244
Then, I want to calculate percentage of smokers vs. non smokers. Something like this (ugly code):
# percentage of smokers options(digits = 2) transform(as.data.frame(table(tips$smoker)), percentage_column = Freq / nrow(tips) * 100) # Var1 Freq percentage_column # 1 No 151 62 # 2 Yes 93 38
Is there a better way to do this?
(even better it would be to do this on a set of columns (which I enumerate) and have output somewhat nicely formatted) (e.g., smoker, day, and time)
-
userJT over 12 yearshmm.. I did not think about "chaining" the table and prop.table like that. how would one do it for multiple enumerated columns...?
-
joran over 12 years@user56 Edited with a possible example (but there are lots of different ways to approach what you describe).
-
userJT over 8 yearscan dplyr add such function (we all use it all the time)
-
Michal aka Miki over 7 yearsI think this code reads also the first row for the analysis. - - I get
female 22 45.83 Gender 1 2.08 male 25 52.08
. How can you skip it? - - My data's separator is tab\t
. Just dodo.call(rbind,lapply(tips[-(1), 2:3],tblFun))
. -
vasili111 almost 5 yearsI am getting "Error:
data
must be a data frame, or other object coercible byfortify()
, not a list" while running: pastebin.com/ZE58J3Ru Any way to fix it? -
Robert almost 5 yearsIn the new version of dplyr the functionality of group_map has changed now group_modify should be used. I changed the example