Using ifelse() to replace NAs in one data frame by referencing another data frame of different length

11,509

Solution 1

Try the following code which takes your original statement and makes a small tweak in the TRUE argument of the ifelse function:

> df1$B <- ifelse(is.na(df1$B) == TRUE, df2$B[df2$A %in% df1$A], df1$B)   
#                         Switched '==' to '%in%' ---^
> df1
            B          C  A
1   1.7169811 2012-10-01  0
2   0.3396226 2012-10-01  5
3   4.0000000 2012-10-01 10
4   0.1509434 2012-10-01 15
5   0.0754717 2012-10-01 20
6  20.0000000 2012-10-01 25
7   1.7169811 2012-10-01  0
8   0.3396226 2012-10-01  5
9   5.0000000 2012-10-01 10
10  5.0000000 2012-10-01 15

Solution 2

You may also use:

df1$B[is.na(df1$B)] <- df2$B[match(df1$A[is.na(df1$B)],df2$A)]
df1

#             B          C  A
# 1   1.7169811 2012-10-01  0
# 2   0.3396226 2012-10-01  5
# 3   4.0000000 2012-10-01 10
# 4   0.1509434 2012-10-01 15
# 5   0.0754717 2012-10-01 20
# 6  20.0000000 2012-10-01 25
# 7   1.7169811 2012-10-01  0
# 8   0.3396226 2012-10-01  5
# 9   5.0000000 2012-10-01 10
# 10  5.0000000 2012-10-01 15
Share:
11,509
Daniel Fletcher
Author by

Daniel Fletcher

BY DAY: I currently work as a senior business analyst for Accenture and am working my way into analytics and data science. BY NIGHT: Once my kids are tucked in bed, I'm constantly working to improve my skills and am on track to earn a "Data Science Specialization" from Johns Hopkins/Coursera in May, 2015. I love learning about and using R to analyze data.

Updated on June 14, 2022

Comments

  • Daniel Fletcher
    Daniel Fletcher almost 2 years

    I already reviewed the following two posts and think they might answer my question, although I'm struggling to see how:

    1) Conditional replacement of values in a data.frame 2) Creating a function to replace NAs from one data.frame with values from another

    With that said, I'm trying to replace NAs in one data frame by referencing another data frame of a different (shorter) length and pulling in replacement values from column "B" where the values for column "A" in each data frame match.

    I've modified the data, below, for simplicity and illustration, although the concept is the same in the actual data. FYI, in the real second data frame, there are also no duplicates in column "A".

    Here's the first data frame (df1):

    > df1
        B          C  A
    1  NA 2012-10-01  0
    2  NA 2012-10-01  5
    3   4 2012-10-01 10
    4  NA 2012-10-01 15
    5  NA 2012-10-01 20
    6  20 2012-10-01 25
    7  NA 2012-10-01  0
    8  NA 2012-10-01  5
    9   5 2012-10-01 10
    10  5 2012-10-01 15
    
    > str(df1)
    'data.frame':   10 obs. of  3 variables:
     $ B: num  NA NA 4 NA NA 20 NA NA 5 5
     $ C: Factor w/ 1 level "2012-10-01": 1 1 1 1 1 1 1 1 1 1
     $ A: num  0 5 10 15 20 25 0 5 10 15
    

    And the second data frame (df2).

    > df2
       A         B
    1  0 1.7169811
    2  5 0.3396226
    3 10 0.1320755
    4 15 0.1509434
    5 20 0.0754717
    6 25 2.0943396
    
    > str(df2)
    'data.frame':   6 obs. of  2 variables:
     $ A: int  0 5 10 15 20 25
     $ B: num  1.717 0.3396 0.1321 0.1509 0.0755 ...
    

    I think I'm pretty close with the following code:

    > ifelse(is.na(df1$B) == TRUE, df2$B[df2$A == df1$A], df1$B)
     [1]  1.7169811  0.3396226  4.0000000  0.1509434  0.0754717 20.0000000         NA         NA
     [9]  5.0000000  5.0000000
    Warning message:
    In df2$A == df1$A :
      longer object length is not a multiple of shorter object length
    

    Obviously, I want the 7th and 8th output elements to be 1.7169811 and 0.3396226, rather than NAs . . .

    Thanks, in advance, for any help, and, once again, thanks for your patience!

  • Daniel Fletcher
    Daniel Fletcher almost 10 years
    @ccapizzano Awesome. Thank you, sir! I had a feeling %in% might have had something to do with it. I felt like Voldemort in the 7th book when he's in the shrieking shack: "Soooo, close. Sooo close." Thanks, again.
  • thelatemail
    thelatemail almost 10 years
    This seems like the most typical R-ish way to me. +1
  • Daniel Fletcher
    Daniel Fletcher almost 10 years
    @thelatemail, why would you say this is the most "R-ish" way?
  • thelatemail
    thelatemail almost 10 years
    @DanielFletcher - just my subjective opinion - the answer uses simple subsetting and the pretty basic building block match function. I could see the answer being used in an introductory textbook.
  • Daniel Fletcher
    Daniel Fletcher almost 10 years
    @thelatemail, thanks. I'm currently taking a bunch of introductory courses via the Johns Hopkins Data Science Specialization on Coursera What are some introductory textbooks you'd recommend? Thanks, again.
  • Daniel Fletcher
    Daniel Fletcher almost 10 years
    @thelatemail. Good deal. Thanks! By the way, how do I say "thanks" to someone without breaking the stackoverflow etiquette of "don't just say 'thanks' in a comment, newb!"?
  • helen.h
    helen.h over 4 years
    @ccapizzano how would i change this ifelse statement to replace NAs where the values in more than one column are a match i.e. column A and C?