Fast way of converting large list to dataframe

25,927

Solution 1

Try this. We assume the components of L all are of the same length, n, and we also assume no row names:

L <- list(a = 1:4, b = 4:1) # test input

n <- length(L[[1]])
DF <- structure(L, row.names = c(NA, -n), class = "data.frame")

Solution 2

I think

lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
do.call(rbind,lst)

works. I don't know if there's a sneakier/dangerous/corner-cutting way to do it that's more efficient.

You could also try

m <- matrix(unlist(lst),byrow=TRUE,ncol=length(lst[[1]]))
rownames(m) <- names(lst)
as.data.frame(m)

... maybe it's faster?

You may not be able to do very much about speeding up the as.data.frame step. Looking at as.data.frame.matrix to see what could be stripped to make it as bare-bones as possible, it seems that the crux is probably that the columns have to be copied into their own individual list elements:

for (i in ic) value[[i]] <- as.vector(x[, i])

You could try stripping down as.data.frame.matrix to see if you can speed it up, but I'm guessing that this operation is the bottleneck. In order to get around it you have to find some faster way of mapping your data from a list of rows into a list of columns (perhaps an Rcpp solution??).

The other thing to consider is whether you really need a data frame -- if your data are of a homogeneous type, you could just keep the results as a matrix. Matrix operations on big data are a lot faster anyway ...

Solution 3

How about just t(as.data.frame(List)) ?

> A = 1:16000
> List = list()
> for(i in 1:700) List[[i]] = A
> system.time(t(as.data.frame(List)))
   user  system elapsed 
   0.25    0.00    0.25 
Share:
25,927
Mayou
Author by

Mayou

Quant at an investment management firm.

Updated on May 11, 2020

Comments

  • Mayou
    Mayou about 4 years

    I have a huge list (700 elements), each element being a vector of length = 16,000. I am looking for an efficient way of converting the list to a dataframe, in the following fashion (this is just a mock example):

    lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
    

    The end result I am looking for is:

     #  [,1] [,2] [,3]
     #a    1    2    3
     #b    4    5    6
     #c    7    8    9
    

    This is what I have tried, but isn't working as I wish:

    library(data.table)
    result = rbindlist(Map(as.data.frame, lst))
    

    Any suggestion? Please keep in mind that my real example has huge dimensions, and I would need a rather efficient way of doing this operation.

    Thank you very much!

  • Mayou
    Mayou over 10 years
    It does work, but it is very slow in case of large lists (which is my situation here)
  • Mayou
    Mayou over 10 years
    The matrix() takes a few minutes to complete, but I guess I could work with that in the meantime. Converting the matrix to a dataframe completely freezes the R GUI though. Thanks!
  • Mayou
    Mayou over 10 years
    I get a very odd result when I do that.. Although my initial list didnt' have names for elements, the dataframe now has some odd column names when I do as.data.frame(List)
  • Señor O
    Señor O over 10 years
    That's because I didn't give my sample data any names.
  • G. Grothendieck
    G. Grothendieck over 10 years
    If list lacks names then use this: structure(L, row.names = c(NA, -n), .Names = seq_along(L), class = "data.frame")
  • Simon O'Hanlon
    Simon O'Hanlon over 10 years
    +1 cool trick. Does this do less copying than say do.call? It seems quite a bit faster.
  • G. Grothendieck
    G. Grothendieck over 10 years
    I would assume that since its directly built up from its constituent parts that it would be fast.