rbind dataframes with a different column name

r dataframe rbind

51,699

Solution 1

You could use rbindlist which takes different column names. Using @LyzandeR's data

library(data.table) #data.table_1.9.5
rbindlist(list(a,b))
#            a         b
# 1: 0.8403348 0.1579255
# 2: 0.4759767 0.8182902
# 3: 0.8091875 0.1080651
# 4: 0.9846333 0.7035959
# 5: 0.2153991 0.8744136
# 6: 0.7604137 0.9753853
# 7: 0.7553924 0.1210260
# 8: 0.7315970 0.6196829
# 9: 0.5619395 0.1120331
#10: 0.5711995 0.7252631

Update

Based on the object names of the 12 datasets (i.e. 'Goal1_Costo', 'Goal2_Costo',..., 'Goal12_Costo'),

 nm1 <- paste(paste0('Goal', 1:12), 'Costo', sep="_")
 #or using `sprintf`
 #nm1 <- sprintf('%s%d_%s', 'Goal', 1:12, 'Costo')
 rbindlist(mget(nm1))

Solution 2

My favourite use of mapply:

Example Data

a <- data.frame(a=runif(5), b=runif(5))
> a
          a         b
1 0.8403348 0.1579255
2 0.4759767 0.8182902
3 0.8091875 0.1080651
4 0.9846333 0.7035959
5 0.2153991 0.8744136

and b

b <- data.frame(c=runif(5), d=runif(5))
> b
          c         d
1 0.7604137 0.9753853
2 0.7553924 0.1210260
3 0.7315970 0.6196829
4 0.5619395 0.1120331
5 0.5711995 0.7252631

Solution

Using mapply:

> mapply(c, a,b)    #or as.data.frame(mapply(c, a,b)) for a data.frame
              a         b
 [1,] 0.8403348 0.1579255
 [2,] 0.4759767 0.8182902
 [3,] 0.8091875 0.1080651
 [4,] 0.9846333 0.7035959
 [5,] 0.2153991 0.8744136
 [6,] 0.7604137 0.9753853
 [7,] 0.7553924 0.1210260
 [8,] 0.7315970 0.6196829
 [9,] 0.5619395 0.1120331
[10,] 0.5711995 0.7252631

And based on @Marat's comment below:

You can also do data.frame(mapply(c, a, b, SIMPLIFY=FALSE)) or, alternatively, data.frame(Map(c,a,b)) to avoid double data.frame-matrix conversion

Solution 3

I would rename the columns. This is very easy with names() if the columns are in the same order.

df1 <- data.frame(one=1:10,two=11:20,three=21:30)

df2 <- data.frame(four=31:40,five=41:50,six=51:60)

names(df2)<-names(df1)

rbind(df1,df2)

df1 <- data.frame(one=1:10,two=11:20,three=21:30)

df2 <- data.frame(four=31:40,five=41:50,six=51:60)

rbind(df1,setnames(df2,names(df1)))

Result:

   one two three
1    1  11    21
2    2  12    22
3    3  13    23
4    4  14    24
5    5  15    25
6    6  16    26
7    7  17    27
8    8  18    28
9    9  19    29
10  10  20    30
11  31  41    51
12  32  42    52
13  33  43    53
14  34  44    54
15  35  45    55
16  36  46    56
17  37  47    57
18  38  48    58
19  39  49    59
20  40  50    60

Solution 4

Another base R approach if you have data.frames with different column names:

# Create a list of data frames
df_list <- list()
df_list[[1]] <- data.frame(x = 1, y = paste0("y1", 1:3))
df_list[[2]] <- data.frame(x = 2, y = paste0("y2", 1:4))
df_list[[3]] <- data.frame(x = 3, y = paste0("y3", 1:5), z = "z3")
df_list
#> [[1]]
#>   x   y
#> 1 1 y11
#> 2 1 y12
#> 3 1 y13
#> 
#> [[2]]
#>   x   y
#> 1 2 y21
#> 2 2 y22
#> 3 2 y23
#> 4 2 y24
#> 
#> [[3]]
#>   x   y  z
#> 1 3 y31 z3
#> 2 3 y32 z3
#> 3 3 y33 z3
#> 4 3 y34 z3
#> 5 3 y35 z3

# This works when the column names are the same
do.call(rbind, df_list[1:2])
#>   x   y
#> 1 1 y11
#> 2 1 y12
#> 3 1 y13
#> 4 2 y21
#> 5 2 y22
#> 6 2 y23
#> 7 2 y24

# but fails when the column names differ
do.call(rbind, df_list)
#> Error in rbind(deparse.level, ...): numbers of columns of arguments do not match

# This can fill the unmatched columns with NA's without 
# depending on other packages:
Reduce(rbind, Map(function(x) {
  x[, setdiff(unique(unlist(lapply(df_list, colnames))), names(x))] <- NA; 
  return(x)
  }, 
  df_list))
#>    x   y    z
#> 1  1 y11 <NA>
#> 2  1 y12 <NA>
#> 3  1 y13 <NA>
#> 4  2 y21 <NA>
#> 5  2 y22 <NA>
#> 6  2 y23 <NA>
#> 7  2 y24 <NA>
#> 8  3 y31   z3
#> 9  3 y32   z3
#> 10 3 y33   z3
#> 11 3 y34   z3
#> 12 3 y35   z3

View more solutions

51,699

Omar Gonzales

Updated on September 05, 2020

Comments

Omar Gonzales over 3 years
I've 12 data frames, each one contains 6 columns: 5 have the same name, 1 is different. Then when I call rbind() I get:
```
Error in match.names(clabs, names(xi)) : 
  names do not match previous names
```
The column that differs is: "goal1Completions". There are 12 goalsCompletions... they are: "goal1Completions", "goal2Completions", "goal3Completions"... and so on.

The best way I can think of is: renaming every column in every data frame to "GoalsCompletions" and then using "rbind()".

Is there a simpler way?

Look on Google O found this package: "gtools". It has a function called: "smartbind". However, after using smartbind() i want to see the the data frame with "View()", my R session crashes...

My data (an example of the first data frame):
```
       date      source     medium   campaign   goal1Completions    ad.cost           Goal
1   2014-10-01  (direct)    (none)   (not set)          0           0.0000            Vida
2   2014-10-01   Master      email     CAFRE            0           0.0000            Vida
3   2014-10-01  apeseg      referral (not set)          0           0.0000            Vida
```
- akrun over 9 years
  
  Do these 12 dataset objects have some name patterns i.e. df1, df2, df3,...etc It may be better to put them in a list and then do rbindlist ie. rbindlist(mget(paste0('df',1:12)))
- Omar Gonzales over 9 years
  
  @akrun, yes the pattern is: Goal1_Costo,Goal2_Costo,... Goal12_Costo. If you need to update your answer, please do.
- akrun over 9 years
  
  @Omar_Gonzales Thanks, updated the answer
Omar Gonzales over 9 years

it seems very clever. The "c" in mapply(c,a,b) is for concatenate? It concatenates "a","b" and keeps the column names from "a"?
Marat Talipov over 9 years

You could avoid double data.frame-matrix conversion by data.frame(mapply(c, a, b, SIMPLIFY=FALSE)) or, alternatively, data.frame(Map(c,a,b))
LyzandeR over 9 years

@OmarGonzales Yes it is the usual concatenate function and it does keep the column names from a. Each time it concatenates the elements (i.e. columns) of the two data.frames and returns a matrix in the end.
akrun over 9 years

OP mentioned about 12 datasets. So probably, df3 <- data.frame(seven=61:70,eight=71:80,nine=81:90);res <- do.call(rbind,lapply(mget(paste0('df',1:3)), function(x) {colnames(x) <- colnames(df1);x})); row.names(res) <- NULL
Omar Gonzales over 9 years

dplyr has not a similar function? I'm lookig for it, if somene knows please post.
akrun over 9 years

@OmarGonzales It has bind_rows, but still the column names will be a problem. So, instead of 2 columns, the output will be 4. According to ?bind_rows When row-binding, columns are matched by name, and any values that don't match will be filled with NA.
Omar Gonzales over 9 years

Thanks to all, but i ended using this, as this seems more simplier. However, i'll need to investigate a little more on the mapplay functions...seems very powerfull.
akrun over 9 years

@OmarGonzales One advantage of using rbindlist is its speed.
MadmanLee about 5 years

This can be dangerous since it will combine data frames with different column dimensions. Would have been perfect though. I'm sure a simple if statement would do though.
Mooks over 3 years

Very late to the party but purrr::map2_df(a, b, c) will work without having to wrap in a data.frame, although I don't know if it's avoiding the double conversion internally. And, like @MaratTalipov's answer, will keep the type of the first df, whereas mapply coerces (in my case to all character when mixing dbl or date and chr columns).