R spreading multiple columns with tidyr

r dataframe dplyr tidyr

97,605

Here's a possible both simple and very efficient solution using data.table

library(data.table) ## v >= 1.9.6
dcast(setDT(df), month ~ student, value.var = c("A", "B")) 
#    month Amy_A Bob_A Amy_B Bob_B
# 1:     1     9     8     6     5
# 2:     2     7     6     7     6
# 3:     3     6     9     8     7

Or a possible tidyr solution

df %>% 
  gather(variable, value, -(month:student)) %>%
  unite(temp, student, variable) %>%
  spread(temp, value)

#   month Amy_A Amy_B Bob_A Bob_B
# 1     1     9     6     8     5
# 2     2     7     7     6     6
# 3     3     6     8     9     7

EDIT 22/10/2019

As mentioned in comments by @gjabel, newer tidyr versions (v1.0.0+) have now pivot_wider and pivot_longer functions (currently in maturing state), hence, a newer approach would be

pivot_wider(data = df, 
            id_cols = month, 
            names_from = student, 
            values_from = c("A", "B"))
# # A tibble: 3 x 5
#     month A_Amy A_Bob B_Amy B_Bob
#     <int> <dbl> <dbl> <dbl> <dbl>
#   1     1     9     8     6     5
#   2     2     7     6     7     6
#   3     3     6     9     8     7

97,605

Ricky

Data lover. R learner. Competitive Scrabble player. SOreadytohelp

Updated on November 15, 2020

Comments

Ricky over 3 years

Take this sample variable

df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

I can use spread from tidyr to change this to wide format.

> df[, -4] %>% spread(student, A)
  month Amy Bob
1     1   9   8
2     2   7   6
3     3   6   9

But how can I spread two values e.g. both A and B, such that the output is something like

  month Amy.A Bob.A Amy.B Bob.B
1     1     9     8     6     5
2     2     7     6     7     6
3     3     6     9     8     7

Polar Bear over 7 years

I have the same problem but i have some multiple entries students, A, and B for some months. The code gives following error: Error: Duplicate identifiers for rows. Please help.
David Arenburg over 7 years

@PolarBear How do you want to handle dupes? You want to sum? mean? Try the data.table solution and add fun.aggregate = sum into dcast
Polar Bear over 7 years

I want to take median of the dupes with the help of tidyr
David Arenburg over 7 years

@PolarBear spread and gather weren't designed to apply functions. You would probably need to use dplyr for that. Or you could just use dcast as I've suggested above. Or you could post a new question if you feel strong about it.
hplieninger about 5 years

I did a benchmark for these: stackoverflow.com/a/54889598/2563804
guyabel over 4 years

pivot_wider(data = df, id_cols = month, names_from = student, values_from = c("A", "B")) should work in tidyr 1.0.0 or above
David Arenburg over 4 years

@gjabel I've eventually decided to add it as an edit (with a credit to you) as it seem to be very hard to find it in the dupe. Thanks
jlp about 4 years

pivot_wider also works without quotation marks for variable names (in this case A and B), i.e. pivot_wider(data = df, id_cols = month, names_from = student, values_from = c(A, B))