how to pivot/unpivot (cast/melt) data frame?
Solution 1
I still can't believe I beat Andrie with an answer. :)
> library(reshape)
> my.df <- read.table(text = "Country 2001 2002 2003
+ Nigeria 1 2 3
+ UK 2 NA 1", header = TRUE)
> my.result <- melt(my.df, id = c("Country"))
> my.result[order(my.result$Country),]
Country variable value
1 Nigeria X2001 1
3 Nigeria X2002 2
5 Nigeria X2003 3
2 UK X2001 2
4 UK X2002 NA
6 UK X2003 1
Solution 2
The base R reshape
approach for this problem is pretty ugly, particularly since the names aren't in a form that reshape
likes. It would be something like the following, where the first setNames
line modifies the column names into something that reshape
can make use of.
reshape(
setNames(mydf, c("Country", paste0("val.", c(2001, 2002, 2003)))),
direction = "long", idvar = "Country", varying = 2:ncol(mydf),
sep = ".", new.row.names = seq_len(prod(dim(mydf[-1]))))
A better alternative in base R is to use stack
, like this:
cbind(mydf[1], stack(mydf[-1]))
# Country values ind
# 1 Nigeria 1 2001
# 2 UK 2 2001
# 3 Nigeria 2 2002
# 4 UK NA 2002
# 5 Nigeria 3 2003
# 6 UK 1 2003
There are also new tools for reshaping data now available, like the "tidyr" package, which gives us gather
. Of course, the tidyr:::gather_.data.frame
method just calls reshape2::melt
, so this part of my answer doesn't necessarily add much except introduce the newer syntax that you might be encountering in the Hadleyverse.
library(tidyr)
gather(mydf, year, value, `2001`:`2003`) ## Note the backticks
# Country year value
# 1 Nigeria 2001 1
# 2 UK 2001 2
# 3 Nigeria 2002 2
# 4 UK 2002 NA
# 5 Nigeria 2003 3
# 6 UK 2003 1
All three options here would need reordering of rows if you want the row order you showed in your question.
A fourth option would be to use merged.stack
from my "splitstackshape" package. Like base R's reshape
, you'll need to modify the column names to something that includes a "variable" and "time" indicator.
library(splitstackshape)
merged.stack(
setNames(mydf, c("Country", paste0("V.", 2001:2003))),
var.stubs = "V", sep = ".")
# Country .time_1 V
# 1: Nigeria 2001 1
# 2: Nigeria 2002 2
# 3: Nigeria 2003 3
# 4: UK 2001 2
# 5: UK 2002 NA
# 6: UK 2003 1
Sample data
mydf <- structure(list(Country = c("Nigeria", "UK"), `2001` = 1:2, `2002` = c(2L,
NA), `2003` = c(3L, 1L)), .Names = c("Country", "2001", "2002",
"2003"), row.names = 1:2, class = "data.frame")
Solution 3
You can use the melt
command from the reshape
package. See here: http://www.statmethods.net/management/reshape.html
Probably something like melt(myframe, id=c('Country'))
Related videos on Youtube
Comments
-
pedrosaurio over 4 years
How can I 'unpivot' a table? What is the proper technical term for this?
UPDATE: The term is called melt
I have a data frame for countries and data for each year
Country 2001 2002 2003 Nigeria 1 2 3 UK 2 NA 1
And I want to have something like
Country Year Value Nigeria 2001 1 Nigeria 2002 2 Nigeria 2003 3 UK 2001 2 UK 2002 NA UK 2003 1
-
Ben Bolker over 12 yearsyou beat me too. Also try:
check.names=FALSE
when reading to get years without X prepended, orsub("^X","",variable)
to strip them off later; andas.numeric(variable)
to convert back to a numeric value -
pedrosaurio over 12 yearsThanks! I guess then that the technical term is to 'melt' the table. Right?
-
Ben Bolker over 12 yearsYes -- at least that's probably the most common term used in R circles, thanks to the amazing
reshape
package (which introduced the terminology of "melting" and "casting", as far as I know) -
Andrie over 12 years+1 Who's using R 2.14 then? Look at that
text=
argument toread.table
! -
Rich Scriven over 9 yearsBackticks for the column subset...? Why. I was just getting good at dplyr too. Haha