R spreading multiple columns with tidyr

97,605

Here's a possible both simple and very efficient solution using data.table

library(data.table) ## v >= 1.9.6
dcast(setDT(df), month ~ student, value.var = c("A", "B")) 
#    month Amy_A Bob_A Amy_B Bob_B
# 1:     1     9     8     6     5
# 2:     2     7     6     7     6
# 3:     3     6     9     8     7

Or a possible tidyr solution

df %>% 
  gather(variable, value, -(month:student)) %>%
  unite(temp, student, variable) %>%
  spread(temp, value)

#   month Amy_A Amy_B Bob_A Bob_B
# 1     1     9     6     8     5
# 2     2     7     7     6     6
# 3     3     6     8     9     7

EDIT 22/10/2019

As mentioned in comments by @gjabel, newer tidyr versions (v1.0.0+) have now pivot_wider and pivot_longer functions (currently in maturing state), hence, a newer approach would be

pivot_wider(data = df, 
            id_cols = month, 
            names_from = student, 
            values_from = c("A", "B"))
# # A tibble: 3 x 5
#     month A_Amy A_Bob B_Amy B_Bob
#     <int> <dbl> <dbl> <dbl> <dbl>
#   1     1     9     8     6     5
#   2     2     7     6     7     6
#   3     3     6     9     8     7
Share:
97,605

Related videos on Youtube

Ricky
Author by

Ricky

Data lover. R learner. Competitive Scrabble player. SOreadytohelp

Updated on November 15, 2020

Comments

  • Ricky
    Ricky over 3 years

    Take this sample variable

    df <- data.frame(month=rep(1:3,2),
                     student=rep(c("Amy", "Bob"), each=3),
                     A=c(9, 7, 6, 8, 6, 9),
                     B=c(6, 7, 8, 5, 6, 7))
    

    I can use spread from tidyr to change this to wide format.

    > df[, -4] %>% spread(student, A)
      month Amy Bob
    1     1   9   8
    2     2   7   6
    3     3   6   9
    

    But how can I spread two values e.g. both A and B, such that the output is something like

      month Amy.A Bob.A Amy.B Bob.B
    1     1     9     8     6     5
    2     2     7     6     7     6
    3     3     6     9     8     7
    
  • Polar Bear
    Polar Bear over 7 years
    I have the same problem but i have some multiple entries students, A, and B for some months. The code gives following error: Error: Duplicate identifiers for rows. Please help.
  • David Arenburg
    David Arenburg over 7 years
    @PolarBear How do you want to handle dupes? You want to sum? mean? Try the data.table solution and add fun.aggregate = sum into dcast
  • Polar Bear
    Polar Bear over 7 years
    I want to take median of the dupes with the help of tidyr
  • David Arenburg
    David Arenburg over 7 years
    @PolarBear spread and gather weren't designed to apply functions. You would probably need to use dplyr for that. Or you could just use dcast as I've suggested above. Or you could post a new question if you feel strong about it.
  • hplieninger
    hplieninger about 5 years
    I did a benchmark for these: stackoverflow.com/a/54889598/2563804
  • guyabel
    guyabel over 4 years
    pivot_wider(data = df, id_cols = month, names_from = student, values_from = c("A", "B")) should work in tidyr 1.0.0 or above
  • David Arenburg
    David Arenburg over 4 years
    @gjabel I've eventually decided to add it as an edit (with a credit to you) as it seem to be very hard to find it in the dupe. Thanks
  • jlp
    jlp about 4 years
    pivot_wider also works without quotation marks for variable names (in this case A and B), i.e. pivot_wider(data = df, id_cols = month, names_from = student, values_from = c(A, B))