Merging more than 2 dataframes in R by rownames

80,314

Solution 1

Three lines of code will give you the exact same result:

dat2 <- cbind(df1, df2, df3, df4)
colnames(dat2)[-(1:7)] <- paste(paste('V', rep(1:100, 2),sep = ''),
                            rep(c('x', 'y'), each = 100), sep = c('.'))
all.equal(dat,dat2)    

Ah I see, now I understand why you are getting into so much pain. Using the old for loop surely does the trick. Maybe there are even more clever solutions

rn <- rownames(df1)
l <- list(df1, df2, df3, df4)
dat <- l[[1]]
for(i in 2:length(l)) {
  dat <- merge(dat, l[[i]],  by= "row.names", all.x= F, all.y= F) [,-1]
  rownames(dat) <- rn
}

Solution 2

join_all from plyr will probably do what you want. But they all must be data frames and the rownames are added as a column

require(plyr)

df3 <- data.frame(df3)
df4 <- data.frame(df4)

df1$rn <- rownames(df1)
df2$rn <- rownames(df2)
df3$rn <- rownames(df3)
df4$rn <- rownames(df4)

df <- join_all(list(df1,df2,df3,df4), by = 'rn', type = 'full')

type argument should help even if the rownames vary and do not match If you do not want the rownames:

df$rn <- NULL

Solution 3

Editing your function, I have came up with the function which allows you to merge more data frames by a specific column key (name of the column). The resulted data frame includes all the variable of the merged data frames (if you wanna keep just the common variables (excluding NA, use: all.x= FALSE, all.y= FALSE)

MyMerge <- function(x, y){
  df <- merge(x, y, by= "name of the common column", all.x= TRUE, all.y= TRUE)
  return(df)
}
new.df <- Reduce(MyMerge, list(df1, df2, df3, df4))

Solution 4

I have been looking for the same function. After trying a couple of the options here and others elsewhere. The easiest for me was:

cbind.data.frame( df1,df2,df3,df4....)
Share:
80,314
Hans Roelofsen
Author by

Hans Roelofsen

I am a GIS specialist working in environmental and ecological sciences.

Updated on January 08, 2020

Comments

  • Hans Roelofsen
    Hans Roelofsen over 4 years

    I gather data from 4 df's and would like to merge them by rownames. I am looking for an efficient way to do this. This is a simplified version of the data I have.

    df1           <- data.frame(N= sample(seq(9, 27, 0.5), 40, replace= T),
                                P= sample(seq(0.3, 4, 0.1), 40, replace= T),
                                C= sample(seq(400, 500, 1), 40, replace= T))
    df2           <- data.frame(origin= sample(c("A", "B", "C", "D", "E"), 40,
                                               replace= T),
                                foo1= sample(c(T, F), 40, replace= T),
                                X= sample(seq(145600, 148300, 100), 40, replace= T),
                                Y= sample(seq(349800, 398600, 100), 40, replace= T))
    df3           <- matrix(sample(seq(0, 1, 0.01), 40), 40, 100)
    df4           <- matrix(sample(seq(0, 1, 0.01), 40), 40, 100)
    rownames(df1) <- paste("P", sprintf("%02d", c(1:40)), sep= "")
    rownames(df2) <- rownames(df1)
    rownames(df3) <- rownames(df1)
    rownames(df4) <- rownames(df1)
    

    This is what I would normally do:

    # merge df1 and df2
    dat           <- merge(df1, df2, by= "row.names", all.x= F, all.y= F) #merge
    rownames(dat) <- dat$Row.names #reset rownames
    dat$Row.names <- NULL  #remove added rownames col
    
    # merge dat and df3
    dat           <- merge(dat, df3, by= "row.names", all.x= F, all.y= F) #merge
    rownames(dat) <- dat$Row.names #reset rownames
    dat$Row.names <- NULL  #remove added rownames col
    
    # merge dat and df4
    dat           <- merge(dat, df4, by= "row.names", all.x= F, all.y= F) #merge
    rownames(dat) <- dat$Row.names #reset rownames
    dat$Row.names <- NULL #remove added rownames col
    

    As you can see, this requires a lot of code. My question is if the same result can be achieved with more simple means. I've tried (without success): UPDATE: this works now!

    MyMerge       <- function(x, y){
      df            <- merge(x, y, by= "row.names", all.x= F, all.y= F)
      rownames(df)  <- df$Row.names
      df$Row.names  <- NULL
      return(df)
    }
    dat           <- Reduce(MyMerge, list(df1, df2, df3, df4))
    

    Thanks in advance for any suggestions