how to pivot/unpivot (cast/melt) data frame?

40,273

Solution 1

I still can't believe I beat Andrie with an answer. :)

> library(reshape)
> my.df <- read.table(text = "Country     2001    2002    2003
   + Nigeria     1       2       3
   + UK          2       NA       1", header = TRUE)
> my.result <- melt(my.df, id = c("Country"))
> my.result[order(my.result$Country),]
     Country variable value
   1 Nigeria    X2001     1
   3 Nigeria    X2002     2
   5 Nigeria    X2003     3
   2      UK    X2001     2
   4      UK    X2002    NA
   6      UK    X2003     1

Solution 2

The base R reshape approach for this problem is pretty ugly, particularly since the names aren't in a form that reshape likes. It would be something like the following, where the first setNames line modifies the column names into something that reshape can make use of.

reshape(
  setNames(mydf, c("Country", paste0("val.", c(2001, 2002, 2003)))), 
  direction = "long", idvar = "Country", varying = 2:ncol(mydf), 
  sep = ".", new.row.names = seq_len(prod(dim(mydf[-1]))))

A better alternative in base R is to use stack, like this:

cbind(mydf[1], stack(mydf[-1]))
#   Country values  ind
# 1 Nigeria      1 2001
# 2      UK      2 2001
# 3 Nigeria      2 2002
# 4      UK     NA 2002
# 5 Nigeria      3 2003
# 6      UK      1 2003

There are also new tools for reshaping data now available, like the "tidyr" package, which gives us gather. Of course, the tidyr:::gather_.data.frame method just calls reshape2::melt, so this part of my answer doesn't necessarily add much except introduce the newer syntax that you might be encountering in the Hadleyverse.

library(tidyr)
gather(mydf, year, value, `2001`:`2003`) ## Note the backticks
#   Country year value
# 1 Nigeria 2001     1
# 2      UK 2001     2
# 3 Nigeria 2002     2
# 4      UK 2002    NA
# 5 Nigeria 2003     3
# 6      UK 2003     1

All three options here would need reordering of rows if you want the row order you showed in your question.


A fourth option would be to use merged.stack from my "splitstackshape" package. Like base R's reshape, you'll need to modify the column names to something that includes a "variable" and "time" indicator.

library(splitstackshape)
merged.stack(
  setNames(mydf, c("Country", paste0("V.", 2001:2003))),
  var.stubs = "V", sep = ".")
#    Country .time_1  V
# 1: Nigeria    2001  1
# 2: Nigeria    2002  2
# 3: Nigeria    2003  3
# 4:      UK    2001  2
# 5:      UK    2002 NA
# 6:      UK    2003  1

Sample data

 mydf <- structure(list(Country = c("Nigeria", "UK"), `2001` = 1:2, `2002` = c(2L, 
     NA), `2003` = c(3L, 1L)), .Names = c("Country", "2001", "2002",               
     "2003"), row.names = 1:2, class = "data.frame")   

Solution 3

You can use the melt command from the reshape package. See here: http://www.statmethods.net/management/reshape.html

Probably something like melt(myframe, id=c('Country'))

Share:
40,273

Related videos on Youtube

pedrosaurio
Author by

pedrosaurio

learning ...

Updated on January 02, 2020

Comments

  • pedrosaurio
    pedrosaurio over 4 years

    How can I 'unpivot' a table? What is the proper technical term for this?

    UPDATE: The term is called melt

    I have a data frame for countries and data for each year

    Country     2001    2002    2003
    Nigeria     1       2       3
    UK          2       NA       1
    

    And I want to have something like

    Country    Year    Value
    Nigeria    2001    1
    Nigeria    2002    2
    Nigeria    2003    3
    UK         2001    2
    UK         2002    NA
    UK         2003    1
    
  • Ben Bolker
    Ben Bolker over 12 years
    you beat me too. Also try: check.names=FALSE when reading to get years without X prepended, or sub("^X","",variable) to strip them off later; and as.numeric(variable) to convert back to a numeric value
  • pedrosaurio
    pedrosaurio over 12 years
    Thanks! I guess then that the technical term is to 'melt' the table. Right?
  • Ben Bolker
    Ben Bolker over 12 years
    Yes -- at least that's probably the most common term used in R circles, thanks to the amazing reshape package (which introduced the terminology of "melting" and "casting", as far as I know)
  • Andrie
    Andrie over 12 years
    +1 Who's using R 2.14 then? Look at that text= argument to read.table!
  • Rich Scriven
    Rich Scriven over 9 years
    Backticks for the column subset...? Why. I was just getting good at dplyr too. Haha