Reshaping data.frame from wide to long format

216,970

Solution 1

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

reshape(d, 
        direction = "long",
        varying = list(names(d)[3:7]),
        v.names = "Value",
        idvar = c("Code", "Country"),
        timevar = "Year",
        times = 1950:1954)

Solution 2

Three alternative solutions:

1) With :

You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt-function from reshape2. You can for example also specify the name of the variable-column:

library(data.table)
long <- melt(setDT(wide), id.vars = c("Code","Country"), variable.name = "year")

which gives:

> long
    Code     Country year  value
 1:  AFG Afghanistan 1950 20,249
 2:  ALB     Albania 1950  8,097
 3:  AFG Afghanistan 1951 21,352
 4:  ALB     Albania 1951  8,986
 5:  AFG Afghanistan 1952 22,532
 6:  ALB     Albania 1952 10,058
 7:  AFG Afghanistan 1953 23,557
 8:  ALB     Albania 1953 11,123
 9:  AFG Afghanistan 1954 24,555
10:  ALB     Albania 1954 12,246

Some alternative notations:

melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

2) With :

library(tidyr)
long <- wide %>% gather(year, value, -c(Code, Country))

Some alternative notations:

wide %>% gather(year, value, -Code, -Country)
wide %>% gather(year, value, -1:-2)
wide %>% gather(year, value, -(1:2))
wide %>% gather(year, value, -1, -2)
wide %>% gather(year, value, 3:7)
wide %>% gather(year, value, `1950`:`1954`)

3) With :

library(reshape2)
long <- melt(wide, id.vars = c("Code", "Country"))

Some alternative notations that give the same result:

# you can also define the id-variables by column number
melt(wide, id.vars = 1:2)

# as an alternative you can also specify the measure-variables
# all other variables will then be used as id-variables
melt(wide, measure.vars = 3:7)
melt(wide, measure.vars = as.character(1950:1954))

NOTES:

  • is retired. Only changes necessary to keep it on CRAN will be made. (source)
  • If you want to exclude NA values, you can add na.rm = TRUE to the melt as well as the gather functions.

Another problem with the data is that the values will be read by R as character-values (as a result of the , in the numbers). You can repair that with gsub and as.numeric:

long$value <- as.numeric(gsub(",", "", long$value))

Or directly with data.table or dplyr:

# data.table
long <- melt(setDT(wide),
             id.vars = c("Code","Country"),
             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]

# tidyr and dplyr
long <- wide %>% gather(year, value, -c(Code,Country)) %>% 
  mutate(value = as.numeric(gsub(",", "", value)))

Data:

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

Solution 3

With tidyr_1.0.0, another option is pivot_longer

library(tidyr)
pivot_longer(df1, -c(Code, Country), values_to = "Value", names_to = "Year")
# A tibble: 10 x 4
#   Code  Country     Year  Value 
#   <fct> <fct>       <chr> <fct> 
# 1 AFG   Afghanistan 1950  20,249
# 2 AFG   Afghanistan 1951  21,352
# 3 AFG   Afghanistan 1952  22,532
# 4 AFG   Afghanistan 1953  23,557
# 5 AFG   Afghanistan 1954  24,555
# 6 ALB   Albania     1950  8,097 
# 7 ALB   Albania     1951  8,986 
# 8 ALB   Albania     1952  10,058
# 9 ALB   Albania     1953  11,123
#10 ALB   Albania     1954  12,246

data

df1 <- structure(list(Code = structure(1:2, .Label = c("AFG", "ALB"), class = "factor"), 
    Country = structure(1:2, .Label = c("Afghanistan", "Albania"
    ), class = "factor"), `1950` = structure(1:2, .Label = c("20,249", 
    "8,097"), class = "factor"), `1951` = structure(1:2, .Label = c("21,352", 
    "8,986"), class = "factor"), `1952` = structure(2:1, .Label = c("10,058", 
    "22,532"), class = "factor"), `1953` = structure(2:1, .Label = c("11,123", 
    "23,557"), class = "factor"), `1954` = structure(2:1, .Label = c("12,246", 
    "24,555"), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

Solution 4

Using reshape package:

#data
x <- read.table(textConnection(
"Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)

library(reshape)

x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

Solution 5

Since this answer is tagged with , I felt it would be useful to share another alternative from base R: stack.

Note, however, that stack does not work with factors--it only works if is.vector is TRUE, and from the documentation for is.vector, we find that:

is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

I'm using the sample data from @Jaap's answer, where the values in the year columns are factors.

Here's the stack approach:

cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
##    Code     Country values  ind
## 1   AFG Afghanistan 20,249 1950
## 2   ALB     Albania  8,097 1950
## 3   AFG Afghanistan 21,352 1951
## 4   ALB     Albania  8,986 1951
## 5   AFG Afghanistan 22,532 1952
## 6   ALB     Albania 10,058 1952
## 7   AFG Afghanistan 23,557 1953
## 8   ALB     Albania 11,123 1953
## 9   AFG Afghanistan 24,555 1954
## 10  ALB     Albania 12,246 1954
Share:
216,970

Related videos on Youtube

mropa
Author by

mropa

Updated on January 05, 2022

Comments

  • mropa
    mropa over 2 years

    I have some trouble to convert my data.frame from a wide table to a long table. At the moment it looks like this:

    Code Country        1950    1951    1952    1953    1954
    AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
    ALB  Albania        8,097   8,986   10,058  11,123  12,246
    

    Now I would like to transform this data.frame into a long data.frame. Something like this:

    Code Country        Year    Value
    AFG  Afghanistan    1950    20,249
    AFG  Afghanistan    1951    21,352
    AFG  Afghanistan    1952    22,532
    AFG  Afghanistan    1953    23,557
    AFG  Afghanistan    1954    24,555
    ALB  Albania        1950    8,097
    ALB  Albania        1951    8,986
    ALB  Albania        1952    10,058
    ALB  Albania        1953    11,123
    ALB  Albania        1954    12,246
    

    I have looked at and already tried using the melt() and the reshape() functions as some people were suggesting in similar questions. However, so far I only get messy results.

    If it is possible I would like to do it with the reshape() function since it looks a little bit nicer to handle.

    • Eduardo Leoni
      Eduardo Leoni over 14 years
      Don't know if that was the problem, but the functions in the reshape package are melt and cast (and recast.)
    • IRTFM
      IRTFM over 9 years
      And the reshape package has been superseded by reshape2.
    • drhagen
      drhagen about 8 years
      And now reshape2 has been superseded by tidyr.
    • NelsonGon
      NelsonGon over 2 years
      And now tidyr's gather and spread have been replaced by pivot_* functions.
  • Jason Goal
    Jason Goal over 6 years
    great answer, just one more tiny reminder : do not put any variables other than id andtime in your data frame, melt could not tell what you want to do in this case.
  • Jaap
    Jaap over 6 years
    @JasonGoal Could you elaborate on that? As I'm interpreting you comment, it shouldn't be a problem. Just specify both the id.vars and the measure.vars.
  • Jason Goal
    Jason Goal over 6 years
    ,then that's good for me, don't know id.vars and the measure.vars can be specified in the first alternative,sorry for the mess, its my fault.
  • Admin
    Admin over 4 years
    Sorry to necro this post - could someone explain to me why 3 works? I've tested it and it works, but I don't understand what dplyr is doing when it sees -c(var1, var2)...
  • Jaap
    Jaap over 4 years
    @ReputableMisnomer When tidyr sees -c(var1, var2) it omits these variables when transforming the data from wide to long format.
  • Admin
    Admin over 4 years
    Ah wow - what a neat trick. Truly a time saver. Was so resistant to move to the tidyverse() yet it surprises me every day...
  • Darwin PC
    Darwin PC about 4 years
    melt was the best option for me. Thanks a lot!
  • Evan Rosica
    Evan Rosica about 4 years
    According to the tidyverse blog gather is now retired and as been replaced by pivot_longer. They state: "New pivot_longer() and pivot_wider() provide modern alternatives to spread() and gather(). They have been carefully redesigned to be easier to learn and remember, and include many new features. spread() and gather() won’t go away, but they’ve been retired which means that they’re no longer under active development."
  • Evan Rosica
    Evan Rosica about 4 years
    This needs more upvotes. According to the Tidyverse Blog gather is being retired and pivot_longer is now the correct way to accomplish this.
  • andschar
    andschar almost 3 years
    @EvanRosica only until they decide to change the function again :p