Read and rbind multiple csv files

r csv data-binding merge dataframe

24,559

Solution 1

Find files (list.files) and read the files in a loop (lapply), then call (do.call) row bind (rbind) to put all files together by rows.

myMergedData <- 
  do.call(rbind,
          lapply(list.files(path = "N:/Ring data by cruise"), read.csv))

Update: There is a vroom package, according to the manuals it is much faster than data.table::fread and base read.csv. The syntax looks neat, too:

library(vroom)
myMergedData <- vroom(files)

Solution 2

If you're looking for speed, then try this:

require(data.table) ## 1.9.2 or 1.9.3
ans = rbindlist(lapply(filenames, fread))

Solution 3

Don't have enough rep to comment, but to answer Rafael Santos, you can use the code here to add params to the lapply in the answer above. Using lapply and read.csv on multiple files (in R)

24,559

Author by

helen.h

Updated on July 09, 2022

Comments

helen.h almost 2 years

I have a series of csv files (one per anum) with the same column headers and different number of rows. Originally I was reading them in and merging them like so;

setwd <- ("N:/Ring data by cruise/Shetland")
LengthHeight2013 <- read.csv("N:/Ring data by      cruise/Shetland/R_0113A_S2013_WD.csv",sep=",",header=TRUE)
LengthHeight2012 <- read.csv("N:/Ring data by cruise/Shetland/R_0212A_S2012_WD.csv",sep=",",header=TRUE)
LengthHeight2011 <- read.csv("N:/Ring data by cruise/Shetland/R_0211A_S2011_WOD.csv",sep=",",header=TRUE)
LengthHeight2010 <- read.csv("N:/Ring data by cruise/Shetland/R_0310A_S2010_WOD.csv",sep=",",header=TRUE)
LengthHeight2009 <- read.csv("N:/Ring data by cruise/Shetland/R_0309A_S2009_WOD.csv",sep=",",header=TRUE)

LengthHeight <- merge(LengthHeight2013,LengthHeight2012,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2011,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2010,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2009,all=TRUE)

I would like to know if there is a shorter/tidier way to do this, also considering that each time I run the script I might want to look at a different range of years.

I also found this bit of code by Tony Cookson which looks like it would do what I want, however the data frame it produces for me has only the correct headers but no data rows.

multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y)}, datalist)

mymergeddata = multmerge("C://R//mergeme")

Lmm almost 5 years

i have a similar issue and use: zoneB = list.files(pattern="*.csv") for (i in 1:length(zoneB)) assign(zoneB[i], read.csv(zoneB[i])) and then ZB = lapply(zoneB, read_csv) %>% bind_rows() ....... but i am wonder is there a a way to exclude specific .csv files in the list?
Rafael Santos almost 5 years

What if I need to pass parameters to the function on lapply? E.g. instead of read.csv I would like to use read.table and set some options on it.
Melania CB over 3 years

Hey zx8754 - Thank you for this! Is there a way to keep an id on each document that the folder has?
zx8754 over 3 years

@MelaniaCB Certainly possible, please ask a new questions with reference to this post, and provide example input and expected outputs.