rbind dataframes with a different column name

51,699

Solution 1

You could use rbindlist which takes different column names. Using @LyzandeR's data

library(data.table) #data.table_1.9.5
rbindlist(list(a,b))
#            a         b
# 1: 0.8403348 0.1579255
# 2: 0.4759767 0.8182902
# 3: 0.8091875 0.1080651
# 4: 0.9846333 0.7035959
# 5: 0.2153991 0.8744136
# 6: 0.7604137 0.9753853
# 7: 0.7553924 0.1210260
# 8: 0.7315970 0.6196829
# 9: 0.5619395 0.1120331
#10: 0.5711995 0.7252631

Update

Based on the object names of the 12 datasets (i.e. 'Goal1_Costo', 'Goal2_Costo',..., 'Goal12_Costo'),

 nm1 <- paste(paste0('Goal', 1:12), 'Costo', sep="_")
 #or using `sprintf`
 #nm1 <- sprintf('%s%d_%s', 'Goal', 1:12, 'Costo')
 rbindlist(mget(nm1))

Solution 2

My favourite use of mapply:

Example Data

a <- data.frame(a=runif(5), b=runif(5))
> a
          a         b
1 0.8403348 0.1579255
2 0.4759767 0.8182902
3 0.8091875 0.1080651
4 0.9846333 0.7035959
5 0.2153991 0.8744136

and b

b <- data.frame(c=runif(5), d=runif(5))
> b
          c         d
1 0.7604137 0.9753853
2 0.7553924 0.1210260
3 0.7315970 0.6196829
4 0.5619395 0.1120331
5 0.5711995 0.7252631

Solution

Using mapply:

> mapply(c, a,b)    #or as.data.frame(mapply(c, a,b)) for a data.frame
              a         b
 [1,] 0.8403348 0.1579255
 [2,] 0.4759767 0.8182902
 [3,] 0.8091875 0.1080651
 [4,] 0.9846333 0.7035959
 [5,] 0.2153991 0.8744136
 [6,] 0.7604137 0.9753853
 [7,] 0.7553924 0.1210260
 [8,] 0.7315970 0.6196829
 [9,] 0.5619395 0.1120331
[10,] 0.5711995 0.7252631

And based on @Marat's comment below:

You can also do data.frame(mapply(c, a, b, SIMPLIFY=FALSE)) or, alternatively, data.frame(Map(c,a,b)) to avoid double data.frame-matrix conversion

Solution 3

I would rename the columns. This is very easy with names() if the columns are in the same order.

df1 <- data.frame(one=1:10,two=11:20,three=21:30)

df2 <- data.frame(four=31:40,five=41:50,six=51:60)

names(df2)<-names(df1)

rbind(df1,df2)

or

df1 <- data.frame(one=1:10,two=11:20,three=21:30)

df2 <- data.frame(four=31:40,five=41:50,six=51:60)

rbind(df1,setnames(df2,names(df1)))

Result:

   one two three
1    1  11    21
2    2  12    22
3    3  13    23
4    4  14    24
5    5  15    25
6    6  16    26
7    7  17    27
8    8  18    28
9    9  19    29
10  10  20    30
11  31  41    51
12  32  42    52
13  33  43    53
14  34  44    54
15  35  45    55
16  36  46    56
17  37  47    57
18  38  48    58
19  39  49    59
20  40  50    60

Solution 4

Another base R approach if you have data.frames with different column names:

# Create a list of data frames
df_list <- list()
df_list[[1]] <- data.frame(x = 1, y = paste0("y1", 1:3))
df_list[[2]] <- data.frame(x = 2, y = paste0("y2", 1:4))
df_list[[3]] <- data.frame(x = 3, y = paste0("y3", 1:5), z = "z3")
df_list
#> [[1]]
#>   x   y
#> 1 1 y11
#> 2 1 y12
#> 3 1 y13
#> 
#> [[2]]
#>   x   y
#> 1 2 y21
#> 2 2 y22
#> 3 2 y23
#> 4 2 y24
#> 
#> [[3]]
#>   x   y  z
#> 1 3 y31 z3
#> 2 3 y32 z3
#> 3 3 y33 z3
#> 4 3 y34 z3
#> 5 3 y35 z3

# This works when the column names are the same
do.call(rbind, df_list[1:2])
#>   x   y
#> 1 1 y11
#> 2 1 y12
#> 3 1 y13
#> 4 2 y21
#> 5 2 y22
#> 6 2 y23
#> 7 2 y24

# but fails when the column names differ
do.call(rbind, df_list)
#> Error in rbind(deparse.level, ...): numbers of columns of arguments do not match

# This can fill the unmatched columns with NA's without 
# depending on other packages:
Reduce(rbind, Map(function(x) {
  x[, setdiff(unique(unlist(lapply(df_list, colnames))), names(x))] <- NA; 
  return(x)
  }, 
  df_list))
#>    x   y    z
#> 1  1 y11 <NA>
#> 2  1 y12 <NA>
#> 3  1 y13 <NA>
#> 4  2 y21 <NA>
#> 5  2 y22 <NA>
#> 6  2 y23 <NA>
#> 7  2 y24 <NA>
#> 8  3 y31   z3
#> 9  3 y32   z3
#> 10 3 y33   z3
#> 11 3 y34   z3
#> 12 3 y35   z3
Share:
51,699

Related videos on Youtube

Omar Gonzales
Author by

Omar Gonzales

Updated on September 05, 2020

Comments

  • Omar Gonzales
    Omar Gonzales over 3 years

    I've 12 data frames, each one contains 6 columns: 5 have the same name, 1 is different. Then when I call rbind() I get:

    Error in match.names(clabs, names(xi)) : 
      names do not match previous names
    

    The column that differs is: "goal1Completions". There are 12 goalsCompletions... they are: "goal1Completions", "goal2Completions", "goal3Completions"... and so on.

    The best way I can think of is: renaming every column in every data frame to "GoalsCompletions" and then using "rbind()".

    Is there a simpler way?

    Look on Google O found this package: "gtools". It has a function called: "smartbind". However, after using smartbind() i want to see the the data frame with "View()", my R session crashes...

    My data (an example of the first data frame):

           date      source     medium   campaign   goal1Completions    ad.cost           Goal
    1   2014-10-01  (direct)    (none)   (not set)          0           0.0000            Vida
    2   2014-10-01   Master      email     CAFRE            0           0.0000            Vida
    3   2014-10-01  apeseg      referral (not set)          0           0.0000            Vida
    
    • akrun
      akrun over 9 years
      Do these 12 dataset objects have some name patterns i.e. df1, df2, df3,...etc It may be better to put them in a list and then do rbindlist ie. rbindlist(mget(paste0('df',1:12)))
    • Omar Gonzales
      Omar Gonzales over 9 years
      @akrun, yes the pattern is: Goal1_Costo,Goal2_Costo,... Goal12_Costo. If you need to update your answer, please do.
    • akrun
      akrun over 9 years
      @Omar_Gonzales Thanks, updated the answer
  • Omar Gonzales
    Omar Gonzales over 9 years
    it seems very clever. The "c" in mapply(c,a,b) is for concatenate? It concatenates "a","b" and keeps the column names from "a"?
  • Marat Talipov
    Marat Talipov over 9 years
    You could avoid double data.frame-matrix conversion by data.frame(mapply(c, a, b, SIMPLIFY=FALSE)) or, alternatively, data.frame(Map(c,a,b))
  • LyzandeR
    LyzandeR over 9 years
    @OmarGonzales Yes it is the usual concatenate function and it does keep the column names from a. Each time it concatenates the elements (i.e. columns) of the two data.frames and returns a matrix in the end.
  • akrun
    akrun over 9 years
    OP mentioned about 12 datasets. So probably, df3 <- data.frame(seven=61:70,eight=71:80,nine=81:90);res <- do.call(rbind,lapply(mget(paste0('df',1:3)), function(x) {colnames(x) <- colnames(df1);x})); row.names(res) <- NULL
  • Omar Gonzales
    Omar Gonzales over 9 years
    dplyr has not a similar function? I'm lookig for it, if somene knows please post.
  • akrun
    akrun over 9 years
    @OmarGonzales It has bind_rows, but still the column names will be a problem. So, instead of 2 columns, the output will be 4. According to ?bind_rows When row-binding, columns are matched by name, and any values that don't match will be filled with NA.
  • Omar Gonzales
    Omar Gonzales over 9 years
    Thanks to all, but i ended using this, as this seems more simplier. However, i'll need to investigate a little more on the mapplay functions...seems very powerfull.
  • akrun
    akrun over 9 years
    @OmarGonzales One advantage of using rbindlist is its speed.
  • MadmanLee
    MadmanLee about 5 years
    This can be dangerous since it will combine data frames with different column dimensions. Would have been perfect though. I'm sure a simple if statement would do though.
  • Mooks
    Mooks over 3 years
    Very late to the party but purrr::map2_df(a, b, c) will work without having to wrap in a data.frame, although I don't know if it's avoiding the double conversion internally. And, like @MaratTalipov's answer, will keep the type of the first df, whereas mapply coerces (in my case to all character when mixing dbl or date and chr columns).