R can't convert NaN to NA

r nan na

39,321

Solution 1

Here's the problem: Your vector is character in mode, so of course it's "not a number". That last element got interpreted as the string "NaN". Using is.nan will only make sense if the vector is numeric. If you want to make a value missing in a character vector (so that it gets handle properly by regression functions), then use (without any quotes), NA_character_.

> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
>  tester1
[1] "2" "2" "3" "4" "2" "3" NA 
>  is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

Neither "NA" nor "NaN" are really missing in character vectors. If for some reason there were values in a factor variable that were "NaN" then you would have been able just use logical indexing:

tester1[tester1 == "NaN"] = "NA"  
# but that would not really be a missing value either 
# and it might screw up a factor variable anyway.

tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))

> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2    2    3    4    2    3    <NA>
Levels: 2 3 4 NaN

That last result might be surprising. There is a remaining "NaN" level but none of elements is "NaN". Instead the element that was "NaN" is now a real missing value signified in print as .

Solution 2

You can't have NaN in a character vector, which is what you have here:

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> is.nan(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> tester1
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"

Notice how R thinks this is a character string.

You can create NaN in a numeric vector:

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> as.numeric(tester1)
[1]   2   2   3   4   2   3 NaN
> is.nan(as.numeric(tester1))
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

Then, of course, R can convert NaN to NA as per your code:

> foo <- as.numeric(tester1)
> foo[is.nan(foo)] <- NA
> foo
[1]  2  2  3  4  2  3 NA

Solution 3

EDIT:

Gavin Simpson in comments reminds me that, in your situation, there are much easier ways to convert what is really an "NaN" to an "NA":

tester1 <- gsub("NaN", "NA", tester1)
tester1
# [1] "2"  "2"  "3"  "4"  "2"  "3"  "NA"

Solution:

To detect which elements of the character vector are NaN, you need to convert the vector to a numeric vector:

tester1[is.nan(as.numeric(tester1))] <- "NA"
tester1
[1] "2"  "2"  "3"  "4"  "2"  "3"  "NA"

Explanation:

There are a couple of reasons that this isn't working as you expect it to.

First, although NaN stands for "Not a Number", it does have class "numeric", and only makes sense inside of a numeric vector.

Second, when it is included in a character vector, the symbol NaN is silently converted to the character string "NaN". When you then test it for nan-ness, the character string returns FALSE:

class(NaN)
# [1] "numeric"
c("1", NaN)
# [1] "1"   "NaN"
is.nan(c("1", NaN))
# [1] FALSE FALSE

39,321

Author by

screechOwl

https://financenerd.blog/blog/

Updated on August 31, 2022

Comments

screechOwl over 1 year

I have a data frame with several factor columns containing NaN's that I would like to convert to NA's (the NaN seems to be a problem for using linear regression objects to predict on new data).

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
> tester1[is.nan(tester1)] = NA
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
> tester1[is.nan(tester1)] = "NA"
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"

Gavin Simpson about 12 years

??? That is converting the string "NaN" to "NA" in a very roundabout way. Surely this is not what the OP wanted, even if they did try to use "NA" as NA in one of their examples.
Josh O'Brien about 12 years

@GavinSimpson -- OK. Fixed now. Thanks for the tap on the shoulder reminding me to pull my head out of ... the weeds!
Gavin Simpson about 12 years

I still think you are overthinking what the OP wants. He wants NaN converted to NA not the string versions but the real R versions indicating Not A Number and missingness respectively. Ignore the "NA" in one of the OP's example - that is a red herring, I presume they thought that quoting NA might work as NA in a character vector or something like that.
Josh O'Brien about 12 years

@GavinSimpson -- I know what you mean, but the OP also quoted all of the integers in the example vectors, so there are more like 25 red herrings up there, if you are right. (Although the reference to NaN giving problems in linear regressions now makes me think you're probably right).
Josh O'Brien about 12 years

Wow. My first ever downvote, presumably for answering the question the OP actually asked, rather than what they may have meant to ask!? Oh well.
Gavin Simpson about 12 years

You didnt Answer the Q the OP asked. @DWin did that. They want to convert NaN to NA. This has almost nothing to do with strings (other than that being the source of is.nan() not matching "NaN". In the OP's tester1 there isn't an NaN (there is a "NaN"). The first line of the Q is pretty explicit - no quotes there - even if the example code showing what the OP did does include a "NA". Granted this is one of the most confusing Qs I've seen for a while; I have no idea how anything remotely like tester1 is being used in a regression.
Josh O'Brien about 12 years

Well, strings did end up having lots to do with it. After all, the OP already knew how to change NaNs to NAs (see line 4 of the question!) but didn't understand that a silent conversion of NaN to "NaN" was taking place. I guess we can at least agree that the question's confusing. Cheers.