How can I use merge to cbind two dataframes

24,748

Solution 1

There's nothing here on which to merge other than row names.

merge(df1, df2, by=0)

Note the order of the rows in the result! These are lexicaly sorted by row name. To get the proper order, use order:

x <- merge(df1, df2, by=0)
x[order(as.numeric(x$Row.names)),]

Solution 2

I have made the matrices smaller for display purposes.

> df1 <- data.frame(matrix(rnorm(15),ncol=3))
> df2 <- data.frame(matrix(rnorm(15),ncol=3))
> colnames(df1) <- paste0("A", 1:3)
> colnames(df2) <- paste0("B", 4:6)

We have data frames df1 and df2, with columns names A1, A2 & A3 and B1, B2 & B3 respectively.

So,as you know, cbind() just sticks the data frames together, side by side.

> cbind(df1, df2)
         A1        A2       A3        B4       B5        B6
1  2.055780  0.362796  1.25536 -1.748416  0.41855 -0.516635
2  0.010779  0.086778 -0.68413  1.183762 -1.20362  0.041147
3 -0.732393  0.235125 -0.89306  1.435362 -0.26066 -0.025933
4 -2.493843 -2.654263  0.36107  0.083018 -0.82251 -0.991135
5  0.935540  0.398196 -0.43043  0.470559 -0.54146  1.955555

merge() looks for common columns. In this case there are none, so it produces essentially an outer product in which each row of df1 is matched against each row of df2.

> merge(df1, df2)
          A1        A2       A3        B4       B5        B6
1   2.055780  0.362796  1.25536 -1.748416  0.41855 -0.516635
2   0.010779  0.086778 -0.68413 -1.748416  0.41855 -0.516635
3  -0.732393  0.235125 -0.89306 -1.748416  0.41855 -0.516635
4  -2.493843 -2.654263  0.36107 -1.748416  0.41855 -0.516635
5   0.935540  0.398196 -0.43043 -1.748416  0.41855 -0.516635
6   2.055780  0.362796  1.25536  1.183762 -1.20362  0.041147
7   0.010779  0.086778 -0.68413  1.183762 -1.20362  0.041147
8  -0.732393  0.235125 -0.89306  1.183762 -1.20362  0.041147
9  -2.493843 -2.654263  0.36107  1.183762 -1.20362  0.041147
10  0.935540  0.398196 -0.43043  1.183762 -1.20362  0.041147
11  2.055780  0.362796  1.25536  1.435362 -0.26066 -0.025933
12  0.010779  0.086778 -0.68413  1.435362 -0.26066 -0.025933
13 -0.732393  0.235125 -0.89306  1.435362 -0.26066 -0.025933
14 -2.493843 -2.654263  0.36107  1.435362 -0.26066 -0.025933
15  0.935540  0.398196 -0.43043  1.435362 -0.26066 -0.025933
16  2.055780  0.362796  1.25536  0.083018 -0.82251 -0.991135
17  0.010779  0.086778 -0.68413  0.083018 -0.82251 -0.991135
18 -0.732393  0.235125 -0.89306  0.083018 -0.82251 -0.991135
19 -2.493843 -2.654263  0.36107  0.083018 -0.82251 -0.991135
20  0.935540  0.398196 -0.43043  0.083018 -0.82251 -0.991135
21  2.055780  0.362796  1.25536  0.470559 -0.54146  1.955555
22  0.010779  0.086778 -0.68413  0.470559 -0.54146  1.955555
23 -0.732393  0.235125 -0.89306  0.470559 -0.54146  1.955555
24 -2.493843 -2.654263  0.36107  0.470559 -0.54146  1.955555
25  0.935540  0.398196 -0.43043  0.470559 -0.54146  1.955555

If we rename the first column in df1 so that it matches the name of the first column in df2 then merge() looks for common values in those two columns. Since there are no common values, the output is empty.

> colnames(df1)[1] = "B4"
> merge(df1, df2)
[1] B4 A2 A3 B5 B6
<0 rows> (or 0-length row.names)

But now if we copy (and reverse, just to make things interesting!) the first column of df2 into the first column of df1...

> df1$B4 = rev(df2$B4)
> df1
        B4       A2        A3
1 -0.50647 -0.48128  0.540799
2 -0.70684 -0.35401  0.872514
3  0.14341  1.12184 -0.079913
4 -0.59989  0.81912  1.726494
5  0.33864  0.85277  0.386702
> df2
        B4       B5        B6
1  0.33864  1.83677  0.406717
2 -0.59989 -0.43630  0.075029
3  0.14341  1.01496  0.095534
4 -0.70684  1.32414 -0.122613
5 -0.50647  0.70709 -0.700225

... and try to merge again...

> merge(df1, df2)
        B4       A2        A3       B5        B6
1 -0.70684 -0.35401  0.872514  1.32414 -0.122613
2 -0.59989  0.81912  1.726494 -0.43630  0.075029
3 -0.50647 -0.48128  0.540799  0.70709 -0.700225
4  0.14341  1.12184 -0.079913  1.01496  0.095534
5  0.33864  0.85277  0.386702  1.83677  0.406717

... we finally get something meaningful: the rows of df1 and df2 are stuck together according to values in df1$B4 matching values in df2$B4.

I hope that this helps.

Share:
24,748
user2763361
Author by

user2763361

Updated on March 15, 2020

Comments

  • user2763361
    user2763361 about 4 years

    Suppose I have two dataframes:

    df1 <- data.frame(matrix(rnorm(10*10),ncol=10))
    df2 <- data.frame(matrix(rnorm(10*10),ncol=10))
    colnames(df1) <- 1:10
    colnames(df2) <- 11:20
    

    How do I use merge to cbind these (I already know about cbind but I am interested in the application of merge here).