Merge multiple data frames - Error in match.names(clabs, names(xi)) : names do not match previous names

16,161

Not sure I can help unfortunately but thought I would post as I found this searching for help on this error. What I effectively had was:

a <- cbind(b,c)
d <- merge(a,e)

And I got that same error. Using a <- data.frame(b,c) fixed the problem, but I can't work out why.

object.size(a);1248124200 bytes

object.size(c);1248124032 bytes

So something is different. All classes are the same, str() reveals nothing. I'm stumped.

Hopefully that aids someone else in the know.

Share:
16,161
Jasmine
Author by

Jasmine

Updated on June 05, 2022

Comments

  • Jasmine
    Jasmine almost 2 years

    I'm getting some really bizarre stuff while trying to merge multiple data frames. Help!

    I need to merge a bunch of data frames by the columns 'RID' and 'VISCODE'. Here is an example of what it looks like:

    d1 = data.frame(ID = sample(9, 1:100), RID = c(2, 5, 7, 9, 12),
                VISCODE = rep('bl', 5),
                value1 = rep(16, 5))
    
    d2 = data.frame(ID = sample(9, 1:100), RID = c(2, 2, 2, 5, 5, 5, 7, 7, 7),
                VISCODE = rep(c('bl', 'm06', 'm12'), 3),
                value2 = rep(100, 9))
    
    d3 = data.frame(ID = sample(9, 1:100), RID = c(2, 2, 2, 5, 5, 5, 9,9,9),
                VISCODE = rep(c('bl', 'm06', 'm12'), 3),
                value3 = rep("a", 9),
                values3.5 = rep("c", 9))
    
    d4 = data.frame(ID =sample(8, 1:100), RID = c(2, 2, 5, 5, 5, 7, 7, 7, 9),
                VISCODE = c(c('bl', 'm12'), rep(c('bl', 'm06', 'm12'), 2), 'bl'),
                value4 = rep("b", 9))
    
    dataList = list(d1, d2, d3, d4)
    

    I looked at the answers to the question titled "Merge several data.frames into one data.frame with a loop." I used the reduce method suggested there as well as a loop I wrote:

    try1 = mymerge(dataList)
    
    try2 <- Reduce(function(x, y) merge(x, y, all= TRUE,
    by=c("RID", "VISCODE")), dataList, accumulate=F)
    

    where dataList is a list of data frames and mymerge is:

    mymerge = function(dataList){
    
    L = length(dataList)
    
    mdat = dataList[[1]]
    
      for(i in 2:L){
    
        mdat = merge(mdat, dataList[[i]], by.x = c("RID", "VISCODE"),
                                      by.y = c("RID", "VISCODE"), all = TRUE)
      }
    
    mdat
    }
    

    For my test data and subsets of my real data, both of these work fine and produce exactly the same results. However, when I use larger subsets of my data, they both break down and give me the following error: Error in match.names(clabs, names(xi)) : names do not match previous names.

    The really weird thing is that using this works:

      dataList = list(demog[1:50,],
                neurobat[1:50,],
                apoe[1:50,],
                mmse[1:50,],
                faq[1:47, ])
    

    And using this fails:

      dataList = list(demog[1:50,],
                neurobat[1:50,],
                apoe[1:50,],
                mmse[1:50,],
                faq[1:48, ])
    

    As far as I can tell, there is nothing special about row 48 of faq. Likewise, using this works:

    dataList = list(demog[1:50,],
                neurobat[1:50,],
                apoe[1:50,],
                mmse[1:50,],
                pdx[1:47, ])
    

    And using this fails:

    dataList = list(demog[1:50,],
                neurobat[1:50,],
                apoe[1:50,],
                mmse[1:50,],
                pdx[1:48, ])
    

    Row 48 in faq and row 48 in pdx have the same values for RID and VISCODE, the same value for EXAMDATE (something I'm not matching on) and different values for ID (another thing I'm not matching on). Besides the matching RID and VISCODE, I see anything special about them. They don't share any other variable names. This same scenario occurs elsewhere in the data without problems.

    To add icing on the complication cake, this doesn't even work:

    dataList = list(demog[1:50,],
                neurobat[1:50,],
                apoe[1:50,],
                mmse[1:50,],
                faq[1:48, 2:3])
    

    where columns 2 and 3 are "RID" and "VISCODE".

    48 isn't even the magic number because this works:

     dataList = list(demog[1:500,],
                neurobat[1:500,],
                apoe[1:500,],
                mmse[1:457,])
    

    while using mmse[1:458, ] fails.

    I can't seem to come up with test data that causes the problem. Has anyone had this problem before? Any better ideas on how to merge?