Reorganize list into dataframe using dplyr

17,064

Solution 1

Here's another approach with a little more dplyr/tidyr functions and piping, however I haven't tested its performance against the original approach in the question and whether it is more elegant depends on personal preference.

library(dplyr); library(tidyr)

lapply(l, `[[`, 2) %>% 
    data.frame %>% 
    add_rownames("key") %>% 
    gather(x, value, -key) %>% 
    select(-x)

#      key      value
#1       b -1.1476570
#2       c -0.2894616
#3       d -0.2992151
#4       b  0.2522234
#5       c -0.8919211
#6       d  0.4356833
#7       b -0.2242679
#8       c  0.3773956
#9       d  0.1333364

Solution 2

Also from the Hadleyverse, but not using "dplyr" would be to consider using melt from "reshape2":

library(reshape2)
melt(l)
#         value Var1   Var2      L2 L1
# 1  -0.6264538 <NA>   <NA> member1  1
# 2   0.1836433    b sample member2  1
# 3  -0.8356286    c sample member2  1
# 4   1.5952808    d sample member2  1
# 5   0.3295078 <NA>   <NA> member1  2
# 6  -0.8204684    b sample member2  2
# 7   0.4874291    c sample member2  2
# 8   0.7383247    d sample member2  2
# 9   0.5757814 <NA>   <NA> member1  3
# 10 -0.3053884    b sample member2  3
# 11  1.5117812    c sample member2  3
# 12  0.3898432    d sample member2  3

From there, you can consider using "dplyr" to do some cleanup. For instance, to get the two-column result you describe, you can do something like:

library(reshape2)
library(dplyr)

melt(l) %>%
  filter(L2 != "member1") %>%
  select(value, Var1)

(Sample data created using set.seed(1)).

Solution 3

Another pure tidyverse solution:

ll <- l %>% map_df(enframe) %>% 
    mutate(key1=map(value, rownames),
           key2=map(value, names),
           key=map2(key1, key2, ~c(.x, .y))) %>%
    select(-key1, -key2) %>%
    unnest()
ll
# A tibble: 12 × 3
      name       value   key
     <chr>       <dbl> <chr>
1  member1  2.12962812     a
2  member2 -0.87049458     b
3  member2  0.96190007     c
4  member2  0.56403433     d
5  member1 -0.41447472     a
6  member2  0.27270458     b
7  member2 -0.01384829     c
8  member2 -0.71561501     d
9  member1 -0.81835698     a
10 member2 -2.12746977     b
11 member2  0.66185843     c
12 member2  0.07878841     d

UPDATE I had thought you want to combine information in member1 and member2, if only member2 is needed, it'be simpler:

ll <- l %>% map_df(enframe) %>% 
    filter(name=="member2") %>%
    mutate(key=map(value, rownames)) %>%
    unnest()
ll
# A tibble: 9 × 3
     name       value   key
    <chr>       <dbl> <chr>
1 member2 -0.87049458     b
2 member2  0.96190007     c
3 member2  0.56403433     d
4 member2  0.27270458     b
5 member2 -0.01384829     c
6 member2 -0.71561501     d
7 member2 -2.12746977     b
8 member2  0.66185843     c
9 member2  0.07878841     d
Share:
17,064
Fabio
Author by

Fabio

GIS addicted, R user

Updated on June 18, 2022

Comments

  • Fabio
    Fabio almost 2 years

    I would convert a structured list in a tidy dataFrame using the speed of the dplyr package. I would know if the solution I am posting right now is "state-of-art" or there's something faster.

    Here is an example of my starting list:

    l = list()
    l[[1]] = list(member1=c(a=rnorm(1)),member2=matrix(rnorm(3),nrow=3,ncol=1,dimnames=list(c(letters[2:4]),c("sample"))))
    l[[2]] = list(member1=c(a=rnorm(1)),member2=matrix(rnorm(3),nrow=3,ncol=1,dimnames=list(c(letters[2:4]),c("sample"))))
    l[[3]] = list(member1=c(a=rnorm(1)),member2=matrix(rnorm(3),nrow=3,ncol=1,dimnames=list(c(letters[2:4]),c("sample"))))
    

    With this result (to show you the toy structure):

    l
    [[1]]
    [[1]]$member1
        a 
    0.3340196 
    
    [[1]]$member2
     sample
    b 1.0098830
    c 0.6413375
    d 0.9080675
    
    [[2]]
    [[2]]$member1
        a 
    0.0590878 
    
    [[2]]$member2
      sample
    b  0.5585736
    c -0.5936157
    d -0.3985687
    
    [[3]]
    [[3]]$member1
         a 
    0.06242458 
    
    [[3]]$member2
      sample
    b -0.2873391
    c  0.5326067
    d -1.1635551
    

    Now I'll use a convenience function to rearrange the data an lapply to navigate through the list:

    organizeSamples = function(x){
      member = x$member2
      output = data.frame(key=rownames(member),value=member[,1])
      return(output)
    }
    l_new = lapply(l, organizeSamples)
    

    Now dplyr does the magic:

    samples = dplyr::bind_rows(l_new)
    samples :
    
      key      value
    1   b  1.0098830
    2   c  0.6413375
    3   d  0.9080675
    4   b  0.5585736
    5   c -0.5936157 
    6   d -0.3985687
    7   b -0.2873391
    8   c  0.5326067
    9   d -1.1635551
    

    There's a way to do it faster, more elegant & compact using dplyr?

  • Fabio
    Fabio about 9 years
    never used the [[ syntax. But I will definetly give it a try and use microbenchmark package to test on a larger list... By the way, I like the piping system!