How to replace NA values in a table for selected columns

158,335

Solution 1

You can do:

x[, 1:2][is.na(x[, 1:2])] <- 0

or better (IMHO), use the variable names:

x[c("a", "b")][is.na(x[c("a", "b")])] <- 0

In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.

Solution 2

Building on @Robert McDonald's tidyr::replace_na() answer, here are some dplyr options for controlling which columns the NAs are replaced:

library(tidyverse)

# by column type:
x %>%
  mutate_if(is.numeric, ~replace_na(., 0))

# select columns defined in vars(col1, col2, ...):
x %>%
  mutate_at(vars(a, b, c), ~replace_na(., 0))

# all columns:
x %>%
  mutate_all(~replace_na(., 0))

Solution 3

Edit 2020-06-15

Since data.table 1.12.4 (Oct 2019), data.table gains two functions to facilitate this: nafill and setnafill.

nafill operates on columns:

cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]

setnafill operates on tables (the replacements happen by-reference/in-place)

setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]

This will also be more efficient than the other options; see ?nafill for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA imputation for time series.


This will work for your data.table version:

for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]

Alternatively, as David Arenburg points out below, you can use set (side benefit - you can use it either on data.frame or data.table):

for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)

Solution 4

This is now trivial in tidyr with replace_na(). The function appears to work for data.tables as well as data.frames:

tidyr::replace_na(x, list(a=0, b=0))

Solution 5

Not sure if this is more concise, but this function will also find and allow replacement of NAs (or any value you like) in selected columns of a data.table:

update.mat <- function(dt, cols, criteria) {
  require(data.table)
  x <- as.data.frame(which(criteria==TRUE, arr.ind = TRUE))
  y <- as.matrix(subset(x, x$col %in% which((names(dt) %in% cols), arr.ind = TRUE)))
  y
}

To apply it:

y[update.mat(y, c("a", "b"), is.na(y))] <- 0

The function creates a matrix of the selected columns and rows (cell coordinates) that meet the input criteria (in this case is.na == TRUE).

Share:
158,335
jnam27
Author by

jnam27

Updated on December 08, 2021

Comments

  • jnam27
    jnam27 over 2 years

    There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following:

    x[is.na(x)]<-0
    

    But, what if I want to restrict it to only certain columns? Let's me show you an example.

    First, let's start with a dataset.

    set.seed(1234)
    x <- data.frame(a=sample(c(1,2,NA), 10, replace=T),
                    b=sample(c(1,2,NA), 10, replace=T), 
                    c=sample(c(1:5,NA), 10, replace=T))
    

    Which gives:

        a  b  c
    1   1 NA  2
    2   2  2  2
    3   2  1  1
    4   2 NA  1
    5  NA  1  2
    6   2 NA  5
    7   1  1  4
    8   1  1 NA
    9   2  1  5
    10  2  1  1
    

    Ok, so I only want to restrict the replacement to columns 'a' and 'b'. My attempt was:

    x[is.na(x), 1:2]<-0
    

    and:

    x[is.na(x[1:2])]<-0
    

    Which does not work.

    My data.table attempt, where y<-data.table(x), was obviously never going to work:

    y[is.na(y[,list(a,b)]), ]
    

    I want to pass columns inside the is.na argument but that obviously wouldn't work.

    I would like to do this in a data.frame and a data.table. My end goal is to recode the 1:2 to 0:1 in 'a' and 'b' while keeping 'c' the way it is, since it is not a logical variable. I have a bunch of columns so I don't want to do it one by one. And, I'd just like to know how to do this.

    Do you have any suggestions?

  • jnam27
    jnam27 over 10 years
    That does the job. What about if I want to search for '1'? I tried to change it around but I couldn't get it to work.
  • flodel
    flodel over 10 years
    Probably like this: x[, 1:2][x[, 1:2] == 1] <- 0
  • info_seekeR
    info_seekeR over 8 years
    thanks for this. Just wanted to know, 3 years on, if there are ways to do the above without a for loop? I imagine this would have been made more concise by data.table team? Thanks.
  • eddi
    eddi over 8 years
    @info_seekeR I don't know of a more concise way
  • Chris McKelt
    Chris McKelt over 7 years
    is this R? looks like C#
  • ChiseledAbs
    ChiseledAbs over 7 years
    @flodel why does the datatable x accept a matrix as its first member only when doing assignation ? Is this feature documented somewhere ? Also I think you forgot to put a comma before the vectors with column names in you second example.
  • flodel
    flodel over 7 years
    @ChiseledAbs, I think you are referring to matrix indexing (see this for example stackoverflow.com/a/13999583/1201032), but it is not limited to assignments, it can also be used to extract data. Regarding the missing comma: no. Data.frames are lists of columns so if you use a single argument to [, it will extract the specified columns (see stackoverflow.com/a/21137524/1201032). I hope this answers your question but in the future, please avoid commenting on very old answers like this one; instead post a new question.
  • Rohit Saluja
    Rohit Saluja about 6 years
    In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector. When I used a predefined vector like this x[Vpredefined][is.na(x[Vpredefined])] <- 0 it gives me error
  • flodel
    flodel about 6 years
    @RohitSaluja, maybe your x is a matrix and not a data.frame like was the case in this question?
  • Rohit Saluja
    Rohit Saluja about 6 years
    My x is a data.table
  • Michael
    Michael over 5 years
    this is a better solution than the selected answer by flodel. Flodel's approach uses the assignment operator <- and therefore involves unnecessary data copying.
  • Tim M. Schendzielorz
    Tim M. Schendzielorz about 5 years
    With this function I get the Error: Error in replace_na(., 0) : argument "value" is missing, with no default. Any suggestions what to change?
  • CertainPerformance
    CertainPerformance over 4 years
    Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you’ve made.
  • Yoann Pageaud
    Yoann Pageaud almost 4 years
    @MichaelChirico In the first part of your comment, did you added the step out <- x to avoid missunderstanding with the x data.frame from the question ? Otherwise this is an even shorter command: y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols] skipping the 'out' variable name and use of 'x'.
  • Yoann Pageaud
    Yoann Pageaud almost 4 years
    @MichaelChirico True ! I totally forgot about nafill()
  • Gooze
    Gooze almost 4 years
    set is great because you can also replace strings (not supported yet in nafill)
  • hrrrrrr5602
    hrrrrrr5602 over 3 years
    @RohitSaluja is correct, using a predefined vector, this approach doesn't work. One can do x[,..Vpredefined], but the second call does not work.
  • ah bon
    ah bon over 2 years
    May i ask how can we do that in pipe %>%?
  • stragu
    stragu over 2 years
    ah, maybe because I was applying it to columns 2:3 instead of 1:2...? Fixed now.