can the value.var in dcast be a list or have multiple value variables?

43,206

Solution 1

From v1.9.6 of data.table, we can cast multiple value.var columns simultaneously (and also use multiple aggregation functions in fun.aggregate). Please see ?dcast and the Efficient reshaping using data.tables vignette for more.

Here's how we could use dcast:

dcast(setDT(mydf), x1 ~ x2, value.var=c("salt", "sugar"))
#    x1 salt_1 salt_2 salt_3 sugar_1 sugar_2 sugar_3
# 1:  1      3      4      6       1       2       2
# 2:  2     10      3      9       5       3       6
# 3:  3     10      7      7       4       6       7

Solution 2

Update

Apparently, the fix was much easier...


Technically, your statement that "apparently there is no such feature" isn't quite correct. There is such a feature in the recast function (which sort of hides the melting and casting process), but it seems like Hadley forgot to finish the function or something: the function returns a list of the relevant parts of your operation.

Here's a minimal example...

Some sample data:

set.seed(1)
mydf <- data.frame(x1 = rep(1:3, each = 3),
                   x2 = rep(1:3, 3),
                   salt = sample(10, 9, TRUE),
                   sugar = sample(7, 9, TRUE))

mydf
#   x1 x2 salt sugar
# 1  1  1    3     1
# 2  1  2    4     2
# 3  1  3    6     2
# 4  2  1   10     5
# 5  2  2    3     3
# 6  2  3    9     6
# 7  3  1   10     4
# 8  3  2    7     6
# 9  3  3    7     7

The effect you seem to be trying to achieve:

reshape(mydf, idvar='x1', timevar='x2', direction='wide')
#   x1 salt.1 sugar.1 salt.2 sugar.2 salt.3 sugar.3
# 1  1      3       1      4       2      6       2
# 4  2     10       5      3       3      9       6
# 7  3     10       4      7       6      7       7

recast in action. (Note that the values are all what we would expect in the dimensions we would expect it.)

library(reshape2)
out <- recast(mydf, x1 ~ x2 + variable, measure.var = c("salt", "sugar"))
### recast(mydf, x1 ~ x2 + variable, id.var = c("x1", "x2"))
out
# $data
#      [,1] [,2] [,3] [,4] [,5] [,6]
# [1,]    3    1    4    2    6    2
# [2,]   10    5    3    3    9    6
# [3,]   10    4    7    6    7    7
# 
# $labels
# $labels[[1]]
#   x1
# 1  1
# 2  2
# 3  3
# 
# $labels[[2]]
#   x2 variable
# 1  1     salt
# 2  1    sugar
# 3  2     salt
# 4  2    sugar
# 5  3     salt
# 6  3    sugar

I'm honestly not sure if this was an incomplete function, or if it is a helper function to another function.

All of the information is there to be able to put the data back together again, making it easy to write a function like this:

recast2 <- function(...) {
  inList <- recast(...)
  setNames(cbind(inList[[2]][[1]], inList[[1]]),
           c(names(inList[[2]][[1]]), 
             do.call(paste, c(rev(inList[[2]][[2]]), sep = "_"))))
}
recast2(mydf, x1 ~ x2 + variable, measure.var = c("salt", "sugar"))
#   x1 salt_1 sugar_1 salt_2 sugar_2 salt_3 sugar_3
# 1  1      3       1      4       2      6       2
# 2  2     10       5      3       3      9       6
# 3  3     10       4      7       6      7       7

Again, a possible advantage with the recast2 approach is the ability to aggregate as well as reshape in the same step.

Solution 3

Using sample data frame mydf from A5C1D2H2I1M1N2O1R2T1's answer.

Edit December 2016 using tidyr

Reshape2 has been replaced with the tidyr package.

library(tidyr)
mydf  %>% 
    gather(variable, value, -x1, -x2)  %>% 
    unite(x2_variable, x2, variable)  %>% 
    spread(x2_variable, value)

#   x1 1_salt 1_sugar 2_salt 2_sugar 3_salt 3_sugar
# 1  1      3       1      4       2      6       2
# 2  2     10       5      3       3      9       6
# 3  3     10       4      7       6      7       7

Original answer based on reshape2

@AlexR added to his question:

Sure, you can 'melt' the 2 value variables into a single column,

For those who come here looking for an answer based on reshape2, here is how to melt the data and then use dcast based on the "variable". .

dt2 <- melt(mydf, id = c("x1", "x2")) 

The variable column will now contain 'var1','var2','var3'. You can achieve the desired effect with

dt3 <- dcast(dt2, x1 ~ x2 + variable, value.var="value")
dt3
#   x1 1_salt 1_sugar 2_salt 2_sugar 3_salt 3_sugar
# 1  1      3       1      4       2      6       2
# 2  2     10       5      3       3      9       6
# 3  3     10       4      7       6      7       7

value.var is optional in this function call as dcast will automatically guess it.

Share:
43,206
AlexR
Author by

AlexR

It is apparent that this fine gentleman prefers to keep an air of mystery about himself, with the exception of pronouncing that he is indeed a fine gentleman. Cheerio!

Updated on September 21, 2020

Comments

  • AlexR
    AlexR over 3 years

    In the help files for dcast.data.table, there is a note stating that a new feature has been implemented: "dcast.data.table allows value.var column to be of type list"

    I take this to mean that one can have multiple value variables within a list, i.e. in this format:

    dcast.data.table(dt, x1~x2, value.var=list('var1','var2','var3'))
    

    But we get an error: 'value.var' must be a character vector of length 1.

    Is there such a feature, and if not, what would be other one-liner alternatives?

    EDIT: In reply to the comments below

    There are situations where you have multiple variables that you want to treat as the value.var. Imagine for example that x2 consists of 3 different weeks, and you have 2 value variables such as salt and sugar consumption and you want to cast those variables across the different weeks. Sure, you can 'melt' the 2 value variables into a single column, but why do something using two functions, when you can do it in one function like reshape does?

    (Note: I've also noticed that reshape cannot treat multiple variables as the time variable as dcast does.)

    So my point is that I don't understand why these functions don't allow for the flexibility to include multiple variables within the value.var or the time.var just as we allow for multiple variables for the id.var.

    • Roland
      Roland about 10 years
      You are misunderstanding the documentation. A data.table column can be of type list and such a column can now be the value.var column.
    • A5C1D2H2I1M1N2O1R2T1
      A5C1D2H2I1M1N2O1R2T1 about 10 years
      @Arun, I'm not entirely sure how this would be a great improvement (or maybe I don't understand the question correctly). Doesn't the fact that there are multiple value.vars imply that the data is not fully "molten"? Alex: Can you update your question to move out of the hypothetical realm and give an example of what you might want to do with these multiple value.vars? Maybe you are thinking something like what I did at this answer?
    • AlexR
      AlexR about 10 years
      @Arun I've elaborated on the purpose of this post and my inquiry.
    • landroni
      landroni over 8 years
    • Henrik
      Henrik almost 6 years
  • AlexR
    AlexR about 10 years
    Thanks for taking the time to go through this. I wasnt aware of recast which seems to do melt+cast. I'd like to add that recast in the reshape package (but not reshape2) is complete and accomplishes the same as your recast2 function.
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 about 10 years
    @AlexR, see my update at the top of the post. Apparently, all that was needed was a change from cast to dcast in the recast code.
  • Morgan Ball
    Morgan Ball about 7 years
    The Dec 2016 update is in my opinion the most flexible approach now. +1
  • filups21
    filups21 over 4 years
    And now gather and spread have been superseded by pivot_wider and pivot_longer in tidyr.