Using rbind() to combine multiple data frames into one larger data.frame within lapply()

15,747

Solution 1

do.call() is your friend.

big.list.of.data.frames <- lapply(files, function(x){
    read.table(x, skip = 3, header = TRUE, stringsAsFactors = FALSE)
})

or more concisely (but less-tinkerable):

big.list.of.data.frames <- lapply(files, read.table, 
                                  skip = 3,header = TRUE,
                                  stringsAsFactors = FALSE)

Then:

big.data.frame <- do.call(rbind,big.list.of.data.frames)

This is a recommended way to do things because "growing" a data frame dynamically in R is painful. Slow and memory-expensive, because a new frame gets built at each iteration.

Solution 2

You can use map_df from purrr package instead of lapply, to directly have all results combined as a data frame.

map_df(files, read.table, skip = 3, header = TRUE, stringsAsFactors = FALSE)

Solution 3

Another option is fread from data.table

library(data.table)
rbindlist(lapply(files, fread, skip=3))
Share:
15,747
John Lynch
Author by

John Lynch

Updated on June 14, 2022

Comments

  • John Lynch
    John Lynch almost 2 years

    I'm using R-Studio 0.99.491 and R version 3.2.3 (2015-12-10). I'm a relative newbie to R, and I'd appreciate some help. I'm doing a project where I'm trying to use the server logs on an old media server to identify which folders/files within the server are still being accessed and which aren't, so that my team knows which files to migrate. Each log is for a 24 hour period, and I have approximately a year's worth of logs, so in theory, I should be able to see all of the access over the past year.

    My ideal output is to get a tree structure or plot that will show me the folders on our server that are being used. I've figured out how to read one log (one day) into R as a data.frame and then use the data.tree package in R to turn that into a tree. Now, I want to recursively go through all of the files in the directory, one by one, and add them to that original data.frame, before I create the tree. Here's my current code:

    #Create the list of log files in the folder
    files <- list.files(pattern = "*.log", full.names = TRUE, recursive = FALSE)
    #Create a new data.frame to hold the aggregated log data
    uridata <- data.frame()
    #My function to go through each file, one by one, and add it to the 'uridata' df, above
    lapply(files, function(x){
        uriraw <- read.table(x, skip = 3, header = TRUE, stringsAsFactors = FALSE)
        #print(nrow(uriraw)
        uridata <- rbind(uridata, uriraw)
        #print(nrow(uridata))
    })
    

    The problem is that, no matter what I try, the value of 'uridata' within the lapply loop seems to not be saved/passed outside of the lapply loop, but is somehow being overwritten each time the loop runs. So instead of getting one big data.frame, I just get the contents of the last 'uriraw' file. (That's why there are those two commented print commands inside the loop; I was testing how many lines there were in the data frames each time the loop ran.)

    Can anyone clarify what I'm doing wrong? Again, I'd like one big data.frame at the end that combines the contents of each of the (currently seven) log files in the folder.

  • John Lynch
    John Lynch about 8 years
    This solved the problem perfectly. Of COURSE I should have been passing the results of lapply() into a variable at the beginning of the function, instead of trying to save them into a variable within lapply(). And thanks for pointing me towards do.call().