R spreading multiple columns with tidyr
97,605
Here's a possible both simple and very efficient solution using data.table
library(data.table) ## v >= 1.9.6
dcast(setDT(df), month ~ student, value.var = c("A", "B"))
# month Amy_A Bob_A Amy_B Bob_B
# 1: 1 9 8 6 5
# 2: 2 7 6 7 6
# 3: 3 6 9 8 7
Or a possible tidyr
solution
df %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 9 6 8 5
# 2 2 7 7 6 6
# 3 3 6 8 9 7
EDIT 22/10/2019
As mentioned in comments by @gjabel, newer tidyr versions (v1.0.0+)
have now pivot_wider
and pivot_longer
functions (currently in maturing state), hence, a newer approach would be
pivot_wider(data = df,
id_cols = month,
names_from = student,
values_from = c("A", "B"))
# # A tibble: 3 x 5
# month A_Amy A_Bob B_Amy B_Bob
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 9 8 6 5
# 2 2 7 6 7 6
# 3 3 6 9 8 7
Related videos on Youtube
Author by
Ricky
Data lover. R learner. Competitive Scrabble player. SOreadytohelp
Updated on November 15, 2020Comments
-
Ricky over 3 years
Take this sample variable
df <- data.frame(month=rep(1:3,2), student=rep(c("Amy", "Bob"), each=3), A=c(9, 7, 6, 8, 6, 9), B=c(6, 7, 8, 5, 6, 7))
I can use
spread
fromtidyr
to change this to wide format.> df[, -4] %>% spread(student, A) month Amy Bob 1 1 9 8 2 2 7 6 3 3 6 9
But how can I spread two values e.g. both
A
andB
, such that the output is something likemonth Amy.A Bob.A Amy.B Bob.B 1 1 9 8 6 5 2 2 7 6 7 6 3 3 6 9 8 7
-
Polar Bear over 7 yearsI have the same problem but i have some multiple entries students, A, and B for some months. The code gives following error: Error: Duplicate identifiers for rows. Please help.
-
David Arenburg over 7 years@PolarBear How do you want to handle dupes? You want to sum? mean? Try the
data.table
solution and addfun.aggregate = sum
intodcast
-
Polar Bear over 7 yearsI want to take median of the dupes with the help of tidyr
-
David Arenburg over 7 years@PolarBear
spread
andgather
weren't designed to apply functions. You would probably need to usedplyr
for that. Or you could just usedcast
as I've suggested above. Or you could post a new question if you feel strong about it. -
hplieninger about 5 yearsI did a benchmark for these: stackoverflow.com/a/54889598/2563804
-
guyabel over 4 years
pivot_wider(data = df, id_cols = month, names_from = student, values_from = c("A", "B"))
should work in tidyr 1.0.0 or above -
David Arenburg over 4 years@gjabel I've eventually decided to add it as an edit (with a credit to you) as it seem to be very hard to find it in the dupe. Thanks
-
jlp about 4 yearspivot_wider also works without quotation marks for variable names (in this case A and B), i.e. pivot_wider(data = df, id_cols = month, names_from = student, values_from = c(A, B))