Replace all NA with FALSE in selected columns in R

r dataframe na missing-data imputation

45,776

Solution 1

If you want to do the replacement for a subset of variables, you can still use the is.na(*) <- trick, as follows:

df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE

IMO using temporary variables makes the logic easier to follow:

vars.to.replace <- c("x1", "x2")
df2 <- df[vars.to.replace]
df2[is.na(df2)] <- FALSE
df[vars.to.replace] <- df2

Solution 2

tidyr::replace_na excellent function.

df %>%
  replace_na(list(x1 = FALSE, x2 = FALSE))

This is such a great quick fix. the only trick is you make a list of the columns you want to change.

Solution 3

Try this code:

df <- data.frame(
  id = c(rep(1:19), NA),
  x1 = sample(c(NA, TRUE), 20, replace = TRUE),
  x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
replace(df, is.na(df), FALSE)

UPDATED for an another solution.

df2 <- df <- data.frame(
  id = c(rep(1:19), NA),
  x1 = sample(c(NA, TRUE), 20, replace = TRUE),
  x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
df2[names(df) == "id"] <- FALSE
df2[names(df) != "id"] <- TRUE
replace(df, is.na(df) & df2, FALSE)

Solution 4

You can use the NAToUnknown function in the gdata package

df[,c('x1', 'x2')] = gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')

Solution 5

With dplyr you could also do

df %>% mutate_each(funs(replace(., is.na(.), F)), x1, x2)

It is a bit less readable compared to just using replace() but more generic as it allows to select the columns to be transformed. This solution especially applies if you want to keep NAs in some columns but want to get rid of NAs in others.

View more solutions

45,776

lokheart

Updated on September 13, 2020

Comments

lokheart over 3 years
I have a question similar to this one, but my dataset is a bit bigger: 50 columns with 1 column as UID and other columns carrying either TRUE or NA, I want to change all the NA to FALSE, but I don't want to use explicit loop.

Can plyr do the trick? Thanks.

UPDATE #1

Thanks for quick reply, but what if my dataset is like below:
```
df <- data.frame(
  id = c(rep(1:19),NA),
  x1 = sample(c(NA,TRUE), 20, replace = TRUE),
  x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)
```
I only want X1 and X2 to be processed, how can this be done?
Jubbles about 12 years

Excellent function except for one snag - if I want to change unknowns to 0, and I already have some NAs and zeroes in the vector, then I receive the error message Error in NAToUnknown.default(x = dots[[1L]][[1L]], unknown = dots[[2L]][[1L]], : 'x' already has value “0”.
tmakino about 11 years

I know this is an old post, but would you explain the first line to me? I get the logic when you break it down using temp variables, but I'd like to understand the one line form. I thought I was familiar with subsetting but I don't understand the [][]. I searched "double brackets" but that turned up something different.
blakeoft over 9 years

@tmakino You just have to read the double brackets as different subsets from left to right. For example, if x <- 1:10, then x[5:10][1:4] will give you the vector 5 6 7 8. In multiple steps, you could take the first subset and call it y, y <- x[5:10] which is 5 6 7 8 9 10. And then subset that vector y[1:4], which gives you 5 6 7 8 again.
coip about 9 years

You can also use the column position instead of explicitly naming them, which is useful when you have a lot of variables to convert or if they have long names: df2[,14:16][is.na(df2[,14:16])] <- 0, for instance, replaces NA with 0 in columns 14, 15, and 16 of data frame, df2.