Merge data.frames with duplicates
Solution 1
First define a function, run.seq
, which provides sequence numbers for duplicates since it appears from the output that what is desired is that the ith duplicate of each name in each component of the merge be associated. Then create a list of the data frames and add a run.seq
column to each component. Finally use Reduce
to merge them all.
run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))
L <- list(df1, df2, df3)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$names)))
out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]
The last line gives:
> out
names data1 data2 data3
1 a 1 1 NA
2 b 2 NA NA
3 c 3 4 1
4 c 4 5 NA
5 d 5 6 NA
6 e NA 2 2
7 e NA 3 NA
EDIT: Revised run.seq
so that input need not be sorted.
Solution 2
See other questions:
- How to join data frames in R (inner, outer, left, right)
- recombining-a-list-of-data-frames-into-a-single-data-frame
- ...
Examples:
library(reshape)
out <- merge_recurse(L)
or
library(plyr)
out<-join(df1, df2, type="full")
out<-join(out, df3, type="full")
*can be looped
or
library(plyr)
out<-ldply(L)
user1291855
Updated on June 04, 2022Comments
-
user1291855 almost 2 years
I have many data.frames, for example:
df1 = data.frame(names=c('a','b','c','c','d'),data1=c(1,2,3,4,5)) df2 = data.frame(names=c('a','e','e','c','c','d'),data2=c(1,2,3,4,5,6)) df3 = data.frame(names=c('c','e'),data3=c(1,2))
and I need to merge these data.frames, without delete the name duplicates
> result names data1 data2 data3 1 'a' 1 1 NA 2 'b' 2 NA NA 3 'c' 3 4 1 4 'c' 4 5 NA 5 'd' 5 6 NA 6 'e' NA 2 2 7 'e' NA 3 NA
I cant find function like merge with option to handle with name duplicates. Thank you for your help. To define my problem. The data comes from biological experiment where one sample have a different number of replicates. I need to merge all experiment, and I need to produce this table. I can't generate unique identifier for replicates.