How to merge two dataframes using multiple columns as key?
Solution 1
As Akrun mentioned in comments, your lookup table (DF2) needs to be reduced to just its unique A/B combinations. For your current dataframe, this isn't a problem, but you will need additional rules if there are multiple possible values for the same combination. From there, the solution is easy:
DF2.u <- unique(DF2)
DF3 <- merge(DF1, DF2.u, all = T)
Note that this will produce a new dataframe with an empty TEST column (all values NA
), and a VAL column assigned from DF2. To do exactly what you wanted (replace TEST with VAL where possible), here is some slightly clunkier code:
DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL
EDIT: in response to your question, you can boil down DF2 if necessary quite simple:
DF2$C <- c(1:12) #now unique() won't work
DF2.u <- unique(DF2[1:3])
A B VAL
1 A C 1
2 A D 3
Solution 2
# this is your DF1
DF1 <- data.frame("A" = rep(c("A","B"), 18),
"B" = rep(c("C","D","E"), 12),
"NUM"= rep(rnorm(36,10,1)),
"TEST" = rep(NA,36))
#this is a DF2 i created, with unique A, B, VAL
DF2 <- data.frame("A" = rep(c("A","B"),3),
"B" = rep(c("C","D","E"),2),
"VAL" = rep(1:6))
# and this is the answer of what i assume you want
tmp <- merge(DF1,DF2, by=c("A","B"), all.x=TRUE, all.y=FALSE)
DF1[4] <- tmp[5]
goldisfine
Updated on March 12, 2020Comments
-
goldisfine about 4 years
Say I have the following dataframes:
DF1 <- data.frame("A" = rep(c("A","B"), 18), "B" = rep(c("C","D","E"), 12), "NUM"= rep(rnorm(36,10,1)), "TEST" = rep(NA,36)) DF2 <- data.frame("A" = rep("A",6), "B" = rep(c("C","D"),6), "VAL" = rep(c(1,3),3))
*Note: Each unique combination of variables
A
andB
inDF2
should have a uniqueVAL
.For each row, I would like to replace the
NA
inTEST
with the corresponding value ofVAL
inDF1
if the values in columnsA
andA
match and the values in columnsB
andB
match for that row. Otherwise, I'd leaveTEST
asNA
. How would I do this without looping through each combination using match?Ideally, an answer would scale to two data frames with many columns to match upon.
-
goldisfine about 9 yearsIn my actual example, I have columns in DF2 which are numeric, and so if I passed the entire DF then I would have a unique DF that was too large. Can I subset DF2 to rows which are only unique in the factor variables?
-
Joe about 9 yearsOf course, the actual indices you take will vary based on what your extra columns are and how they are arranged
-
smci about 9 yearsIn Python pandas, you could just simply set a compound key using variables A,B. Then this is a simple left-join/merge.