How to replace NA values in a table for selected columns
Solution 1
You can do:
x[, 1:2][is.na(x[, 1:2])] <- 0
or better (IMHO), use the variable names:
x[c("a", "b")][is.na(x[c("a", "b")])] <- 0
In both cases, 1:2
or c("a", "b")
can be replaced by a pre-defined vector.
Solution 2
Building on @Robert McDonald's tidyr::replace_na()
answer, here are some dplyr
options for controlling which columns the NA
s are replaced:
library(tidyverse)
# by column type:
x %>%
mutate_if(is.numeric, ~replace_na(., 0))
# select columns defined in vars(col1, col2, ...):
x %>%
mutate_at(vars(a, b, c), ~replace_na(., 0))
# all columns:
x %>%
mutate_all(~replace_na(., 0))
Solution 3
Edit 2020-06-15
Since data.table
1.12.4 (Oct 2019), data.table
gains two functions to facilitate this: nafill
and setnafill
.
nafill
operates on columns:
cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]
setnafill
operates on tables (the replacements happen by-reference/in-place)
setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]
This will also be more efficient than the other options; see ?nafill
for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA
imputation for time series.
This will work for your data.table
version:
for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]
Alternatively, as David Arenburg points out below, you can use set
(side benefit - you can use it either on data.frame
or data.table
):
for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)
Solution 4
This is now trivial in tidyr
with replace_na()
. The function appears to work for data.tables as well as data.frames:
tidyr::replace_na(x, list(a=0, b=0))
Solution 5
Not sure if this is more concise, but this function will also find and allow replacement of NAs (or any value you like) in selected columns of a data.table:
update.mat <- function(dt, cols, criteria) {
require(data.table)
x <- as.data.frame(which(criteria==TRUE, arr.ind = TRUE))
y <- as.matrix(subset(x, x$col %in% which((names(dt) %in% cols), arr.ind = TRUE)))
y
}
To apply it:
y[update.mat(y, c("a", "b"), is.na(y))] <- 0
The function creates a matrix of the selected columns and rows (cell coordinates) that meet the input criteria (in this case is.na == TRUE).
jnam27
Updated on December 08, 2021Comments
-
jnam27 over 2 years
There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following:
x[is.na(x)]<-0
But, what if I want to restrict it to only certain columns? Let's me show you an example.
First, let's start with a dataset.
set.seed(1234) x <- data.frame(a=sample(c(1,2,NA), 10, replace=T), b=sample(c(1,2,NA), 10, replace=T), c=sample(c(1:5,NA), 10, replace=T))
Which gives:
a b c 1 1 NA 2 2 2 2 2 3 2 1 1 4 2 NA 1 5 NA 1 2 6 2 NA 5 7 1 1 4 8 1 1 NA 9 2 1 5 10 2 1 1
Ok, so I only want to restrict the replacement to columns 'a' and 'b'. My attempt was:
x[is.na(x), 1:2]<-0
and:
x[is.na(x[1:2])]<-0
Which does not work.
My data.table attempt, where
y<-data.table(x)
, was obviously never going to work:y[is.na(y[,list(a,b)]), ]
I want to pass columns inside the is.na argument but that obviously wouldn't work.
I would like to do this in a data.frame and a data.table. My end goal is to recode the 1:2 to 0:1 in 'a' and 'b' while keeping 'c' the way it is, since it is not a logical variable. I have a bunch of columns so I don't want to do it one by one. And, I'd just like to know how to do this.
Do you have any suggestions?
-
jnam27 over 10 yearsThat does the job. What about if I want to search for '1'? I tried to change it around but I couldn't get it to work.
-
flodel over 10 yearsProbably like this:
x[, 1:2][x[, 1:2] == 1] <- 0
-
info_seekeR over 8 yearsthanks for this. Just wanted to know, 3 years on, if there are ways to do the above without a for loop? I imagine this would have been made more concise by data.table team? Thanks.
-
eddi over 8 years@info_seekeR I don't know of a more concise way
-
Chris McKelt over 7 yearsis this R? looks like C#
-
ChiseledAbs over 7 years@flodel why does the datatable
x
accept a matrix as its first member only when doing assignation ? Is this feature documented somewhere ? Also I think you forgot to put a comma before the vectors with column names in you second example. -
flodel over 7 years@ChiseledAbs, I think you are referring to matrix indexing (see this for example stackoverflow.com/a/13999583/1201032), but it is not limited to assignments, it can also be used to extract data. Regarding the missing comma: no. Data.frames are lists of columns so if you use a single argument to
[
, it will extract the specified columns (see stackoverflow.com/a/21137524/1201032). I hope this answers your question but in the future, please avoid commenting on very old answers like this one; instead post a new question. -
Rohit Saluja about 6 years
In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.
When I used a predefined vector like thisx[Vpredefined][is.na(x[Vpredefined])] <- 0
it gives me error -
flodel about 6 years@RohitSaluja, maybe your
x
is a matrix and not a data.frame like was the case in this question? -
Rohit Saluja about 6 yearsMy x is a data.table
-
Michael over 5 yearsthis is a better solution than the selected answer by flodel. Flodel's approach uses the assignment operator <- and therefore involves unnecessary data copying.
-
Tim M. Schendzielorz about 5 yearsWith this function I get the Error:
Error in replace_na(., 0) : argument "value" is missing, with no default
. Any suggestions what to change? -
CertainPerformance over 4 yearsThank you for this code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you’ve made.
-
Yoann Pageaud almost 4 years@MichaelChirico In the first part of your comment, did you added the step
out <- x
to avoid missunderstanding with the x data.frame from the question ? Otherwise this is an even shorter command:y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols]
skipping the 'out' variable name and use of 'x'. -
Yoann Pageaud almost 4 years@MichaelChirico True ! I totally forgot about nafill()
-
Gooze almost 4 years
set
is great because you can also replace strings (not supported yet in nafill) -
hrrrrrr5602 over 3 years@RohitSaluja is correct, using a predefined vector, this approach doesn't work. One can do x[,..Vpredefined], but the second call does not work.
-
ah bon over 2 yearsMay i ask how can we do that in pipe
%>%
? -
stragu over 2 yearsah, maybe because I was applying it to columns 2:3 instead of 1:2...? Fixed now.