Read and rbind multiple csv files

24,559

Solution 1

Find files (list.files) and read the files in a loop (lapply), then call (do.call) row bind (rbind) to put all files together by rows.

myMergedData <- 
  do.call(rbind,
          lapply(list.files(path = "N:/Ring data by cruise"), read.csv))

Update: There is a vroom package, according to the manuals it is much faster than data.table::fread and base read.csv. The syntax looks neat, too:

library(vroom)
myMergedData <- vroom(files)

Solution 2

If you're looking for speed, then try this:

require(data.table) ## 1.9.2 or 1.9.3
ans = rbindlist(lapply(filenames, fread))

Solution 3

Don't have enough rep to comment, but to answer Rafael Santos, you can use the code here to add params to the lapply in the answer above. Using lapply and read.csv on multiple files (in R)

Share:
24,559
helen.h
Author by

helen.h

Updated on July 09, 2022

Comments

  • helen.h
    helen.h almost 2 years

    I have a series of csv files (one per anum) with the same column headers and different number of rows. Originally I was reading them in and merging them like so;

    setwd <- ("N:/Ring data by cruise/Shetland")
    LengthHeight2013 <- read.csv("N:/Ring data by      cruise/Shetland/R_0113A_S2013_WD.csv",sep=",",header=TRUE)
    LengthHeight2012 <- read.csv("N:/Ring data by cruise/Shetland/R_0212A_S2012_WD.csv",sep=",",header=TRUE)
    LengthHeight2011 <- read.csv("N:/Ring data by cruise/Shetland/R_0211A_S2011_WOD.csv",sep=",",header=TRUE)
    LengthHeight2010 <- read.csv("N:/Ring data by cruise/Shetland/R_0310A_S2010_WOD.csv",sep=",",header=TRUE)
    LengthHeight2009 <- read.csv("N:/Ring data by cruise/Shetland/R_0309A_S2009_WOD.csv",sep=",",header=TRUE)
    
    LengthHeight <- merge(LengthHeight2013,LengthHeight2012,all=TRUE)
    LengthHeight <- merge(LengthHeight,LengthHeight2011,all=TRUE)
    LengthHeight <- merge(LengthHeight,LengthHeight2010,all=TRUE)
    LengthHeight <- merge(LengthHeight,LengthHeight2009,all=TRUE)
    

    I would like to know if there is a shorter/tidier way to do this, also considering that each time I run the script I might want to look at a different range of years.

    I also found this bit of code by Tony Cookson which looks like it would do what I want, however the data frame it produces for me has only the correct headers but no data rows.

    multmerge = function(mypath){
    filenames=list.files(path=mypath, full.names=TRUE)
    datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
    Reduce(function(x,y) {merge(x,y)}, datalist)
    
    mymergeddata = multmerge("C://R//mergeme")
    
  • Lmm
    Lmm almost 5 years
    i have a similar issue and use: zoneB = list.files(pattern="*.csv") for (i in 1:length(zoneB)) assign(zoneB[i], read.csv(zoneB[i])) and then ZB = lapply(zoneB, read_csv) %>% bind_rows() ....... but i am wonder is there a a way to exclude specific .csv files in the list?
  • Rafael Santos
    Rafael Santos almost 5 years
    What if I need to pass parameters to the function on lapply? E.g. instead of read.csv I would like to use read.table and set some options on it.
  • Melania CB
    Melania CB over 3 years
    Hey zx8754 - Thank you for this! Is there a way to keep an id on each document that the folder has?
  • zx8754
    zx8754 over 3 years
    @MelaniaCB Certainly possible, please ask a new questions with reference to this post, and provide example input and expected outputs.