How can I merge multiple dataframes with the same column names?
Solution 1
You can use Reduce()
and complete.cases()
as follows:
merged <- Reduce(function(x, y) merge(x, y, all=TRUE),
list(master, cond_a, cond_b, cond_c, cond_d))
merged[complete.cases(merged), ]
# userid condition answer1 answer2
# 1 bar A 1 2
# 2 bar B 3 4
# 4 bar C 5 6
# 6 bar D 7 8
# 8 foo A 1 2
# 9 foo B 3 4
# 11 foo C 5 6
# 13 foo D 7 8
Reduce()
might take some getting accustomed to. You define your function, and then provide a list
of objects to repeatedly apply the function to. Thus, that statement is like doing:
temp1 <- merge(master, cond_a, all=TRUE)
temp2 <- merge(temp1, cond_b, all=TRUE)
temp3 <- merge(temp2, ....)
Or something like:
merge(merge(merge(master, cond_a, all=TRUE), cond_b, all=TRUE), cond_c, all=TRUE)
complete.cases()
creates a logical vector of whether the specified columns are "complete" or not; this logical vector can be used to subset from the merged data.frame
.
Solution 2
As stated by the OP, given that no explicit relationship with the master data frame, an option is this:
temp <-rbind(cond_a,cond_b,cond_c,cond_d)
temp[order(temp["userid"]),]
Perhaps if any relationship was known, there could be a non-simplistic solution.
Solution 3
You can express this join as a SQL
statement, and then use the sqldf
library to execute it.
cond_all = rbind(cond_a, cond_b, cond_c, cond_d)
> sqldf('select p.userid as userid, p.condition as condition, answer1, answer2 from master as p join cond_all as q on p.userid=q.userid and p.condition=q.condition order by userid, condition')
userid condition answer1 answer2
1 bar A 1 2
2 bar B 3 4
3 bar C 5 6
4 bar D 7 8
5 foo A 1 2
6 foo B 3 4
7 foo C 5 6
8 foo D 7 8
You mentioned in a comment that the master dataframe has extra columns that do not exist in the cond dataframes. You should be able to modify this SQL query to still work for this case.
slhck
Video quality guy and researcher, PhD student in computer science. Founder/CEO of AVEQ. I offer personal consulting and help with video encoding, especially with FFmpeg. Send a mail to werner.robitza at gmail.com. More info on my website.
Updated on July 10, 2022Comments
-
slhck almost 2 years
What I have:
I have a "master" dataframe that has the following columns:
userid, condition
Since there are four experiment conditions, I also have four dataframes that carry answer information, with the following columns:
userid, condition, answer1, answer2
Now, I'd like to join these, so all combinations of user IDs, conditions and their answers to these conditions are merged. Each condition should only have the correct answer in the appropriate column, per row.
Short, self-contained example:
master = data.frame(userid=c("foo","foo","foo","foo","bar","bar","bar","bar"), condition=c("A","B","C","D","A","B","C","D")) cond_a = data.frame(userid=c("foo","bar"), condition="A", answer1=c("1","1"), answer2=c("2","2")) cond_b = data.frame(userid=c("foo","bar"), condition="B", answer1=c("3","3"), answer2=c("4","4")) cond_c = data.frame(userid=c("foo","bar"), condition="C", answer1=c("5","5"), answer2=c("6","6")) cond_d = data.frame(userid=c("foo","bar"), condition="D", answer1=c("7","7"), answer2=c("8","8"))
How do I merge all conditions into the master, so the master table looks like follows?
userid condition answer1 answer2 1 bar A 1 2 2 bar B 3 4 3 bar C 5 6 4 bar D 7 8 5 foo A 1 2 6 foo B 3 4 7 foo C 5 6 8 foo D 7 8
I've tried the following:
temp = merge(master, cond_a, all.x=TRUE)
Which gives me:
userid condition answer1 answer2 1 bar A 1 2 2 bar B <NA> <NA> 3 bar C <NA> <NA> 4 bar D <NA> <NA> 5 foo A 1 2 6 foo B <NA> <NA> 7 foo C <NA> <NA> 8 foo D <NA> <NA>
But as soon as I do this…
merge(temp, cond_b, all.x=TRUE)
There are no values for condition
B
. How come?userid condition answer1 answer2 1 bar A 1 2 2 bar B <NA> <NA> 3 bar C <NA> <NA> 4 bar D <NA> <NA> 5 foo A 1 2 6 foo B <NA> <NA> 7 foo C <NA> <NA> 8 foo D <NA> <NA>
-
slhck over 11 yearsCool, this works great for my specific example! Now, what would I have to do if
master
data frame included additional columns that I wanted to keep? Basically, I'd just want to assign the answers to the uniqueuserid
andcondition
pairs. -
slhck over 11 yearsSo, to be more precise I now get a row with
foo, A, 1, 2, other
, but then subsequentlyfoo, B, <NA>, <NA>, blah
, which would be removed in thecomplete.cases()
step. -
slhck over 11 yearsWhat exactly do you mean by "explicit relationship"? Obviously,
userid
andcondition
are factors in the experiment data, while the answer scores are simple numeric items that belong to a certain user and treatment. -
slhck over 11 yearsI think I got it. When I
rbind
all condition dataframes first and thenmerge(master, all_conditions, all.x=TRUE)
, this seems to give me everything. -
A_K over 11 yearsWhat I meant was that even without the master df, one could arrive at the output you want. So just cond_[a-d] suffices. If there was more relationships (like columns in there) between master & cond_[a-d] then this might be insufficient as a solution