How to replace NA values in a table for selected columns

r replace dataframe data.table na

158,335

Solution 1

You can do:

x[, 1:2][is.na(x[, 1:2])] <- 0

or better (IMHO), use the variable names:

x[c("a", "b")][is.na(x[c("a", "b")])] <- 0

In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.

Solution 2

Building on @Robert McDonald's tidyr::replace_na() answer, here are some dplyr options for controlling which columns the NAs are replaced:

library(tidyverse)

# by column type:
x %>%
  mutate_if(is.numeric, ~replace_na(., 0))

# select columns defined in vars(col1, col2, ...):
x %>%
  mutate_at(vars(a, b, c), ~replace_na(., 0))

# all columns:
x %>%
  mutate_all(~replace_na(., 0))

Solution 3

Edit 2020-06-15

Since data.table 1.12.4 (Oct 2019), data.table gains two functions to facilitate this: nafill and setnafill.

nafill operates on columns:

cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]

setnafill operates on tables (the replacements happen by-reference/in-place)

setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]

This will also be more efficient than the other options; see ?nafill for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA imputation for time series.

This will work for your data.table version:

for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]

Alternatively, as David Arenburg points out below, you can use set (side benefit - you can use it either on data.frame or data.table):

for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)

Solution 4

This is now trivial in tidyr with replace_na(). The function appears to work for data.tables as well as data.frames:

tidyr::replace_na(x, list(a=0, b=0))

Solution 5

Not sure if this is more concise, but this function will also find and allow replacement of NAs (or any value you like) in selected columns of a data.table:

update.mat <- function(dt, cols, criteria) {
  require(data.table)
  x <- as.data.frame(which(criteria==TRUE, arr.ind = TRUE))
  y <- as.matrix(subset(x, x$col %in% which((names(dt) %in% cols), arr.ind = TRUE)))
  y
}

To apply it:

y[update.mat(y, c("a", "b"), is.na(y))] <- 0

The function creates a matrix of the selected columns and rows (cell coordinates) that meet the input criteria (in this case is.na == TRUE).

View more solutions

158,335

Author by

jnam27

Updated on December 08, 2021

Comments

jnam27 over 2 years
There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following:
```
x[is.na(x)]<-0
```
But, what if I want to restrict it to only certain columns? Let's me show you an example.

First, let's start with a dataset.
```
set.seed(1234)
x <- data.frame(a=sample(c(1,2,NA), 10, replace=T),
                b=sample(c(1,2,NA), 10, replace=T), 
                c=sample(c(1:5,NA), 10, replace=T))
```
Which gives:
```
    a  b  c
1   1 NA  2
2   2  2  2
3   2  1  1
4   2 NA  1
5  NA  1  2
6   2 NA  5
7   1  1  4
8   1  1 NA
9   2  1  5
10  2  1  1
```
Ok, so I only want to restrict the replacement to columns 'a' and 'b'. My attempt was:
```
x[is.na(x), 1:2]<-0
```
and:
```
x[is.na(x[1:2])]<-0
```
Which does not work.

My data.table attempt, where y<-data.table(x), was obviously never going to work:
```
y[is.na(y[,list(a,b)]), ]
```
I want to pass columns inside the is.na argument but that obviously wouldn't work.

I would like to do this in a data.frame and a data.table. My end goal is to recode the 1:2 to 0:1 in 'a' and 'b' while keeping 'c' the way it is, since it is not a logical variable. I have a bunch of columns so I don't want to do it one by one. And, I'd just like to know how to do this.

Do you have any suggestions?
jnam27 over 10 years

That does the job. What about if I want to search for '1'? I tried to change it around but I couldn't get it to work.
flodel over 10 years

Probably like this: x[, 1:2][x[, 1:2] == 1] <- 0
info_seekeR over 8 years

thanks for this. Just wanted to know, 3 years on, if there are ways to do the above without a for loop? I imagine this would have been made more concise by data.table team? Thanks.
eddi over 8 years

@info_seekeR I don't know of a more concise way
Chris McKelt over 7 years

is this R? looks like C#
ChiseledAbs over 7 years

@flodel why does the datatable x accept a matrix as its first member only when doing assignation ? Is this feature documented somewhere ? Also I think you forgot to put a comma before the vectors with column names in you second example.
flodel over 7 years

@ChiseledAbs, I think you are referring to matrix indexing (see this for example stackoverflow.com/a/13999583/1201032), but it is not limited to assignments, it can also be used to extract data. Regarding the missing comma: no. Data.frames are lists of columns so if you use a single argument to [, it will extract the specified columns (see stackoverflow.com/a/21137524/1201032). I hope this answers your question but in the future, please avoid commenting on very old answers like this one; instead post a new question.
Rohit Saluja about 6 years

In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector. When I used a predefined vector like this x[Vpredefined][is.na(x[Vpredefined])] <- 0 it gives me error
flodel about 6 years

@RohitSaluja, maybe your x is a matrix and not a data.frame like was the case in this question?
Rohit Saluja about 6 years

My x is a data.table
Michael over 5 years

this is a better solution than the selected answer by flodel. Flodel's approach uses the assignment operator <- and therefore involves unnecessary data copying.
Tim M. Schendzielorz about 5 years

With this function I get the Error: Error in replace_na(., 0) : argument "value" is missing, with no default. Any suggestions what to change?
CertainPerformance over 4 years

Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you’ve made.
Yoann Pageaud almost 4 years

@MichaelChirico In the first part of your comment, did you added the step out <- x to avoid missunderstanding with the x data.frame from the question ? Otherwise this is an even shorter command: y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols] skipping the 'out' variable name and use of 'x'.
Yoann Pageaud almost 4 years

@MichaelChirico True ! I totally forgot about nafill()
Gooze almost 4 years

set is great because you can also replace strings (not supported yet in nafill)
hrrrrrr5602 over 3 years

@RohitSaluja is correct, using a predefined vector, this approach doesn't work. One can do x[,..Vpredefined], but the second call does not work.
ah bon over 2 years

May i ask how can we do that in pipe %>%?
stragu over 2 years

ah, maybe because I was applying it to columns 2:3 instead of 1:2...? Fixed now.