Reshaping wide to long with multiple values columns
Solution 1
reshape
does this with the appropriate arguments.
varying
lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names
is the long format equivalents. Between the two, a mapping is created.
From ?reshape
:
Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.
Given these varying
and v.names
arguments, reshape
is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5
for this example data, but that is not safe in general.
Given the values of times
and v.names
, reshape
splits the varying
columns on a .
character (the default sep
argument) to create the columns in the output.
times
specifies values that are to be used in the created var
column, and v.names
are pasted onto these values to get column names in the wide format for mapping to the result.
Finally, idvar
is specified to be the sbj
column, which identifies individual records in the wide format (thanks @thelatemail).
reshape(dw, direction='long',
varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),
timevar='var',
times=c('f1', 'f2'),
v.names=c('avg', 'sd'),
idvar='sbj')
## sbj blabla var avg sd
## A.f1 A bA f1 10 6
## B.f1 B bB f1 12 5
## C.f1 C bC f1 20 7
## D.f1 D bD f1 22 8
## A.f2 A bA f2 50 10
## B.f2 B bB f2 70 11
## C.f2 C bC f2 20 8
## D.f2 D bD f2 22 9
Solution 2
Another option using Hadley's new tidyr
package.
library(tidyr)
library(dplyr)
dw <- read.table(header=T, text='
sbj f1.avg f1.sd f2.avg f2.sd blabla
A 10 6 50 10 bA
B 12 5 70 11 bB
C 20 7 20 8 bC
D 22 8 22 9 bD
')
dw %>%
gather(v, value, f1.avg:f2.sd) %>%
separate(v, c("var", "col")) %>%
arrange(sbj) %>%
spread(col, value)
Solution 3
melt
from the >=1.9.6 version of data.table
, does this by specifying the column index in measure.vars
as a list
.
melt(setDT(dw), measure.vars=list(c(2,4), c(3,5)),
variable.name='var', value.name=c('avg', 'sd'))[,
var:= paste0('f',var)][order(sbj)]
# sbj blabla var avg sd
#1: A bA f1 10 6
#2: A bA f2 50 10
#3: B bB f1 12 5
#4: B bB f2 70 11
#5: C bC f1 20 7
#6: C bC f2 20 8
#7: D bD f1 22 8
#8: D bD f2 22 9
Or you could use the new patterns
function:
melt(setDT(dw),
measure = patterns("avg", "sd"),
variable.name = 'var', value.name = c('avg', 'sd'))
# sbj blabla var avg sd
# 1: A bA 1 10 6
# 2: B bB 1 12 5
# 3: C bC 1 20 7
# 4: D bD 1 22 8
# 5: A bA 2 50 10
# 6: B bB 2 70 11
# 7: C bC 2 20 8
# 8: D bD 2 22 9
Solution 4
This seems to do what you want except that the f
is removed from elements in time
.
reshape(dw, idvar = "sbj", varying = list(c(2,4),c(3,5)), v.names = c("ave", "sd"), direction = "long")
sbj blabla time ave sd
A.1 A bA 1 10 6
B.1 B bB 1 12 5
C.1 C bC 1 20 7
D.1 D bD 1 22 8
A.2 A bA 2 50 10
B.2 B bB 2 70 11
C.2 C bC 2 20 8
D.2 D bD 2 22 9
Solution 5
To add to the options available here, you can also consider merged.stack
from my "splitstackshape" package:
library(splitstackshape)
merged.stack(dw, var.stubs = c("avg", "sd"), sep = "var.stubs", atStart = FALSE)
# sbj blabla .time_1 avg sd
# 1: A bA f1. 10 6
# 2: A bA f2. 50 10
# 3: B bB f1. 12 5
# 4: B bB f2. 70 11
# 5: C bC f1. 20 7
# 6: C bC f2. 20 8
# 7: D bD f1. 22 8
# 8: D bD f2. 22 9
You can also do a little more cleanup on the ".time_1"
variable, like this.
merged.stack(dw, var.stubs = c("avg", "sd"),
sep = "var.stubs", atStart = FALSE)[, .time_1 := sub(
".", "", .time_1, fixed = TRUE)][]
# sbj blabla .time_1 avg sd
# 1: A bA f1 10 6
# 2: A bA f2 50 10
# 3: B bB f1 12 5
# 4: B bB f2 70 11
# 5: C bC f1 20 7
# 6: C bC f2 20 8
# 7: D bD f1 22 8
# 8: D bD f2 22 9
You would note the use of the atStart = FALSE
argument. This is because your names are in a little bit of a different order than reshape-related functions seem to like. In general, the "stub" is expected to come first, and then the "times", like this:
dw2 <- dw
setnames(dw2, gsub("(.*)\\.(.*)", "\\2.\\1", names(dw2)))
names(dw2)
# [1] "sbj" "avg.f1" "sd.f1" "avg.f2" "sd.f2" "blabla"
If the names were in that format, then both base R's reshape
and merged.stack
benefit from more direct syntax:
merged.stack(dw2, var.stubs = c("avg", "sd"), sep = ".")
reshape(dw2, idvar = c("sbj", "blabla"), varying = 2:5,
sep = ".", direction = "long")
Vasily A
Updated on July 14, 2022Comments
-
Vasily A almost 2 years
I need to reshape my wide table into long format but keeping multiple fields for each record, for example:
dw <- read.table(header=T, text=' sbj f1.avg f1.sd f2.avg f2.sd blabla A 10 6 50 10 bA B 12 5 70 11 bB C 20 7 20 8 bC D 22 8 22 9 bD ') # Now I want to melt this table, keeping both AVG and SD as separate fields for each measurement, to get something like this: # sbj var avg sd blabla # A f1 10 6 bA # A f2 50 10 bA # B f1 12 5 bB # B f2 70 11 bB # C f1 20 7 bC # C f2 20 8 bC # D f1 22 8 bD # D f2 22 9 bD
I have basic knowledge of using
melt
andreshape
, but it is not obvious for me how to apply such reshaping in my case. -
Vasily A almost 10 yearsaha, you do almost the same call as in previous answer except for using
idvar
instead oftimevar
andtimes
- I will sort out the difference... Thanks! -
Mark Miller almost 10 years@VasilyA The structure of
varying
is also very different between the answers. Indeed, the only things they seem to have in common is they both use the samev.names
and both usedirection="long"
. -
Vasily A almost 10 yearsindeed, now I see... thanks for pointing out!
-
thelatemail almost 10 yearsAdding
idvar='sbj'
will avoid the adding of theid
column to the result too. -
IRTFM almost 10 yearsThe interesting thing in comparing the two correct answers is that this one uses the structure of the
list
argument to 'varying', while the Lundberg answer uses 'times' and 'v.names' to convey some of the structure. I've never gotten those options straightened out in my head and usually use trial and error. -
Brian D over 7 yearssurprisingly (at least to me), the order of the columns matters. If you reorder the columns dw = dw[,c(1,4,3,2,5,6)] and then run this answer, the result is incorrect.
-
Matthew Lundberg over 7 years@BrianD So it does, thanks! I was tacitly using an assumption of the data (even without realizing it) and will fix the answer shortly.
-
Brian D over 7 yearsHelpful info about tidyr and how the above code works here: blog.rstudio.org/2014/07/22/introducing-tidyr
-
Robin over 5 yearsTo clarify, the code for this answer does not remove the
f
from the column names to create the elements. Without, thetimes
argument the column names aren't even considered. When creating the dataframedw
, I can replace the "f1" with "f1.alpha" and "f2" with "f2.beta", run the code in this answer, and the "time" column will be the same as in this answer. It won't be "1.alpha" or "2.beta". If someone wants to keep the column names as elements, the column names need to be specified in thetimes
argument (in the same order as in thevarying
argument). -
Robin over 5 yearsTo make this answer more robust at handling many pairs of "avg" and "sd" columns replace the
varying
argument withvarying=list(grep("avg", colnames(dw), value=T), grep("sd", colnames(dw), value=T))