Fast way of converting large list to dataframe
Solution 1
Try this. We assume the components of L
all are of the same length, n
, and we also assume no row names:
L <- list(a = 1:4, b = 4:1) # test input
n <- length(L[[1]])
DF <- structure(L, row.names = c(NA, -n), class = "data.frame")
Solution 2
I think
lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
do.call(rbind,lst)
works. I don't know if there's a sneakier/dangerous/corner-cutting way to do it that's more efficient.
You could also try
m <- matrix(unlist(lst),byrow=TRUE,ncol=length(lst[[1]]))
rownames(m) <- names(lst)
as.data.frame(m)
... maybe it's faster?
You may not be able to do very much about speeding up the as.data.frame
step. Looking at as.data.frame.matrix
to see what could be stripped to make it as bare-bones as possible, it seems that the crux is probably that the columns have to be copied into their own individual list elements:
for (i in ic) value[[i]] <- as.vector(x[, i])
You could try stripping down as.data.frame.matrix
to see if you can speed it up, but I'm guessing that this operation is the bottleneck. In order to get around it you have to find some faster way of mapping your data from a list of rows into a list of columns (perhaps an Rcpp
solution??).
The other thing to consider is whether you really need a data frame -- if your data are of a homogeneous type, you could just keep the results as a matrix. Matrix operations on big data are a lot faster anyway ...
Solution 3
How about just t(as.data.frame(List))
?
> A = 1:16000
> List = list()
> for(i in 1:700) List[[i]] = A
> system.time(t(as.data.frame(List)))
user system elapsed
0.25 0.00 0.25
Comments
-
Mayou about 4 years
I have a huge list (700 elements), each element being a vector of length = 16,000. I am looking for an efficient way of converting the list to a dataframe, in the following fashion (this is just a mock example):
lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
The end result I am looking for is:
# [,1] [,2] [,3] #a 1 2 3 #b 4 5 6 #c 7 8 9
This is what I have tried, but isn't working as I wish:
library(data.table) result = rbindlist(Map(as.data.frame, lst))
Any suggestion? Please keep in mind that my real example has huge dimensions, and I would need a rather efficient way of doing this operation.
Thank you very much!
-
Mayou over 10 yearsIt does work, but it is very slow in case of large lists (which is my situation here)
-
Mayou over 10 yearsThe
matrix()
takes a few minutes to complete, but I guess I could work with that in the meantime. Converting the matrix to a dataframe completely freezes the R GUI though. Thanks! -
Mayou over 10 yearsI get a very odd result when I do that.. Although my initial list didnt' have names for elements, the dataframe now has some odd column names when I do
as.data.frame(List)
-
Señor O over 10 yearsThat's because I didn't give my sample data any names.
-
G. Grothendieck over 10 yearsIf list lacks names then use this:
structure(L, row.names = c(NA, -n), .Names = seq_along(L), class = "data.frame")
-
Simon O'Hanlon over 10 years+1 cool trick. Does this do less copying than say
do.call
? It seems quite a bit faster. -
G. Grothendieck over 10 yearsI would assume that since its directly built up from its constituent parts that it would be fast.