R can't convert NaN to NA
Solution 1
Here's the problem: Your vector is character in mode, so of course it's "not a number". That last element got interpreted as the string "NaN". Using is.nan
will only make sense if the vector is numeric. If you want to make a value missing in a character vector (so that it gets handle properly by regression functions), then use (without any quotes), NA_character_
.
> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
> tester1
[1] "2" "2" "3" "4" "2" "3" NA
> is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Neither "NA" nor "NaN" are really missing in character vectors. If for some reason there were values in a factor variable that were "NaN" then you would have been able just use logical indexing:
tester1[tester1 == "NaN"] = "NA"
# but that would not really be a missing value either
# and it might screw up a factor variable anyway.
tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))
> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2 2 3 4 2 3 <NA>
Levels: 2 3 4 NaN
That last result might be surprising. There is a remaining "NaN" level but none of elements is "NaN". Instead the element that was "NaN" is now a real missing value signified in print as .
Solution 2
You can't have NaN
in a character vector, which is what you have here:
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> is.nan(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
Notice how R thinks this is a character string.
You can create NaN
in a numeric vector:
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> as.numeric(tester1)
[1] 2 2 3 4 2 3 NaN
> is.nan(as.numeric(tester1))
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Then, of course, R can convert NaN
to NA
as per your code:
> foo <- as.numeric(tester1)
> foo[is.nan(foo)] <- NA
> foo
[1] 2 2 3 4 2 3 NA
Solution 3
EDIT:
Gavin Simpson in comments reminds me that, in your situation, there are much easier ways to convert what is really an "NaN" to an "NA":
tester1 <- gsub("NaN", "NA", tester1)
tester1
# [1] "2" "2" "3" "4" "2" "3" "NA"
Solution:
To detect which elements of the character vector are NaN
, you need to convert the vector to a numeric vector:
tester1[is.nan(as.numeric(tester1))] <- "NA"
tester1
[1] "2" "2" "3" "4" "2" "3" "NA"
Explanation:
There are a couple of reasons that this isn't working as you expect it to.
First, although NaN
stands for "Not a Number", it does have class "numeric"
, and only makes sense inside of a numeric vector.
Second, when it is included in a character vector, the symbol NaN
is silently converted to the character string "NaN"
. When you then test it for nan
-ness, the character string returns FALSE
:
class(NaN)
# [1] "numeric"
c("1", NaN)
# [1] "1" "NaN"
is.nan(c("1", NaN))
# [1] FALSE FALSE
Comments
-
screechOwl over 1 year
I have a data frame with several factor columns containing
NaN
's that I would like to convert toNA
's (theNaN
seems to be a problem for using linear regression objects to predict on new data).> tester1 <- c("2", "2", "3", "4", "2", "3", NaN) > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN" > tester1[is.nan(tester1)] = NA > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN" > tester1[is.nan(tester1)] = "NA" > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN"
-
Gavin Simpson about 12 years??? That is converting the string "NaN" to "NA" in a very roundabout way. Surely this is not what the OP wanted, even if they did try to use "NA" as
NA
in one of their examples. -
Josh O'Brien about 12 years@GavinSimpson -- OK. Fixed now. Thanks for the tap on the shoulder reminding me to pull my head out of ... the weeds!
-
Gavin Simpson about 12 yearsI still think you are overthinking what the OP wants. He wants
NaN
converted toNA
not the string versions but the real R versions indicating Not A Number and missingness respectively. Ignore the"NA"
in one of the OP's example - that is a red herring, I presume they thought that quoting NA might work asNA
in a character vector or something like that. -
Josh O'Brien about 12 years@GavinSimpson -- I know what you mean, but the OP also quoted all of the integers in the example vectors, so there are more like 25 red herrings up there, if you are right. (Although the reference to NaN giving problems in linear regressions now makes me think you're probably right).
-
Josh O'Brien about 12 yearsWow. My first ever downvote, presumably for answering the question the OP actually asked, rather than what they may have meant to ask!? Oh well.
-
Gavin Simpson about 12 yearsYou didnt Answer the Q the OP asked. @DWin did that. They want to convert
NaN
toNA
. This has almost nothing to do with strings (other than that being the source ofis.nan()
not matching"NaN"
. In the OP'stester1
there isn't anNaN
(there is a"NaN"
). The first line of the Q is pretty explicit - no quotes there - even if the example code showing what the OP did does include a"NA"
. Granted this is one of the most confusing Qs I've seen for a while; I have no idea how anything remotely liketester1
is being used in a regression. -
Josh O'Brien about 12 yearsWell, strings did end up having lots to do with it. After all, the OP already knew how to change
NaN
s toNA
s (see line 4 of the question!) but didn't understand that a silent conversion ofNaN
to"NaN"
was taking place. I guess we can at least agree that the question's confusing. Cheers.