Merging data frames with different number of rows and different columns

50,358

If A and B are the two input data frames, here are some solutions:

1) merge This solutions works regardless of whether A or B has more rows.

merge(data.frame(A, row.names=NULL), data.frame(B, row.names=NULL), 
  by = 0, all = TRUE)[-1]

The first two arguments could be replaced with just A and B respectively if A and B have default rownames, i.e. 1, 2, ..., or if they have consistent rownames. That is, merge(A, B, by = 0, all = TRUE)[-1] .

For example, if we have this input:

# test inputs
A <- data.frame(BOD, row.names = letters[1:6])
B <- setNames(2 * BOD[1:2, ], c("X", "Y"))

then:

merge(data.frame(A, row.names=NULL), data.frame(B, row.names=NULL), 
  by = 0, all = TRUE)[-1]

gives:

  Time demand  X    Y
1    1    8.3  2 16.6
2    2   10.3  4 20.6
3    3   19.0 NA   NA
4    4   16.0 NA   NA
5    5   15.6 NA   NA
6    7   19.8 NA   NA

1a) An equivalent variation is:

do.call("merge", c(lapply(list(A, B), data.frame, row.names=NULL), 
  by = 0, all = TRUE))[-1]

2) cbind.zoo This solution assumes that A has more rows and that B's entries are all of the same type, e.g. all numeric. A is not restricted. These conditions hold in the data of the question.

library(zoo)
data.frame(A, cbind(zoo(, 1:nrow(A)), as.zoo(B)))
Share:
50,358
rar
Author by

rar

Updated on June 20, 2020

Comments

  • rar
    rar about 4 years

    I have two data frames with different number of columns and rows. I want to combine them into one data frame.

    > month.saf
       Name NCDC    Year    Month   Day HrMn    Temp    Q
    244 AP  99999   2014    2       1   0      12       1
    245 AP  99999   2014    2       1   300    12.2     1
    246 AP  99999   2014    2       1   600    14.4     1
    247 AP  99999   2014    2       1   900    18.6     1
    248 AP  99999   2014    2       1   1200   18       1
    249 AP  99999   2014    2       1   1500   13.6     1
    250 AP  99999   2014    2       1   1800   11.8     1
    251 AP  99999   2014    2       1   2100   10.8     1
    252 AP  99999   2014    2       2   0      8.4      1
    253 AP  99999   2014    2       2   300    8.6      1
    254 AP  99999   2014    2       2   600    19.8     2
    255 AP  99999   2014    2       2   900    22.8     1
    256 AP  99999   2014    2       2   1200   20.8     1
    257 AP  99999   2014    2       2   1500   16.4     1
    258 AP  99999   2014    2       2   1800   13.4     1
    259 AP  99999   2014    2       2   2100   12.4     1
    > T2Mdf
                        V1               V2
    0     293.494262695312 291.642639160156
    300   294.003479003906 292.375091552734
    600   296.809997558594 295.207885742188
    900   298.287811279297 297.181549072266
    1200  298.317565917969 297.725708007813
    1500  298.134002685547 296.226165771484
    1800  296.006805419922 293.354248046875
    2100  293.785491943359 293.547210693359
    0.1   294.638732910156 293.019866943359
    300.1 292.179992675781 291.256958007812
    

    The output that I want is like this:

        Name    NCDC    Year    Month   Day HrMn    Temp    Q   V1          V2
    244 AP  99999   2014        2       1   0       12      1   293.4942627 291.6426392
    245 AP  99999   2014        2       1   300     12.2    1   294.003479  292.3750916
    246 AP  99999   2014        2       1   600     14.4    1   296.8099976 295.2078857
    247 AP  99999   2014        2       1   900     18.6    1   298.2878113 297.1815491
    248 AP  99999   2014        2       1   1200    18      1   298.3175659 297.725708
    249 AP  99999   2014        2       1   1500    13.6    1   298.1340027 296.2261658
    250 AP  99999   2014        2       1   1800    11.8    1   296.0068054 293.354248
    251 AP  99999   2014        2       1   2100    10.8    1   293.7854919 293.5472107
    252 AP  99999   2014        2       2   0       8.4     1   294.6387329 293.0198669
    253 AP  99999   2014        2       2   300     8.6     1   292.1799927 291.256958
    254 AP  99999   2014        2       2   600     19.8    2   292.2477417 291.3471069
    255 AP  99999   2014        2       2   900     22.8    1   294.2276306 294.2766418
    256 AP  99999   2014        2       2   1200    20.8    1   NA          NA
    257 AP  99999   2014        2       2   1500    16.4    1   NA          NA
    258 AP  99999   2014        2       2   1800    13.4    1   NA          NA
    259 AP  99999   2014        2       2   2100    12.4    1   NA          NA
    

    I tried cbindbut it gives me an error

    Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 216, 220

    And using rbind.fill() but it gives me something like

    V1               V2                     Name        USAF  NCDC Year Month Day HrMn  I   Type QCP Temp  Q
        1  293.494262695312 291.642639160156       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
        2  294.003479003906 292.375091552734       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
        3  296.809997558594 295.207885742188       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
        4  298.287811279297 297.181549072266       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
        5  298.317565917969 297.725708007813       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
        6              <NA>             <NA>        AP  421820 99999 2014     2   1    0    4   FM-12 NA   12  1
        7              <NA>             <NA>        AP  421820 99999 2014     2   1  300    4   FM-12 NA 12.2  1
        8              <NA>             <NA>        AP  421820 99999 2014     2   1  600    4   FM-12 NA 14.4  1
        9              <NA>             <NA>        AP  421820 99999 2014     2   1  900    4   FM-12 NA 18.6  1
        10             <NA>             <NA>        AP  421820 99999 2014     2   1 1200    4   FM-12 NA   18  1
    

    How is it possible to do this in R?