What is the difference between <NA> and NA?

23,438

Solution 1

When you are dealing with factors, when the NA is wrapped in angled brackets ( <NA> ), that indicates thtat it is in fact NA.

When it is NA without brackets, then it is not NA, but rather a proper factor whose label is "NA"

# Note a 'real' NA and a string with the word "NA"
x <- factor(c("hello", NA, "world", "NA"))

x
[1] hello <NA>  world NA   
Levels: hello NA world      <~~ The string appears as a level, the actual NA does not. 

as.numeric(x)              
[1]  1 NA  3  2            <~~ The string has a numeric value (here, 2, alphabetically)
                               The NA's numeric value is just NA

Edit to answer @Arun's question:

R is simply trying to distinguish between a string whose value are the two letters "NA" and an actual missing value, NA Thus the difference you see when displaying df versus df$y. Example:

df <- data.frame(x=1:4, y=c("a", NA_character_, "c", "NA"), stringsAsFactors=FALSE)

Note the two different styles of NA:

> df
  x    y
1 1    a
2 2 <NA>
3 3    c
4 4   NA

However, if we look at just 'df$y'

[1] "a"  NA   "c"  "NA"

But, if we remove the quotation marks (similar to what we see when printing a data.frame to the console):

print(df$y, quote=FALSE)
[1] a    <NA> c    NA  

And thus, we once again have the distinction of NA via the angled brackets.

Solution 2

It is just the way that R displays NA in a factor:

> as.factor(NA)
[1] <NA>
Levels: 
> 
> f <- factor(c(1:3, NA))
> levels(f)
[1] "1" "2" "3"
> f
[1] 1    2    3    <NA>
Levels: 1 2 3
> is.na(f)
[1] FALSE FALSE FALSE  TRUE

One presumes this is a means by which one would differentiate between NA and "NA" in the way a factor is printed as it prints without the quotes, even for character labels/levels:

> f2 <- factor(c("NA",NA))
> f2
[1] NA   <NA>
Levels: NA
> is.na(f2)
[1] FALSE  TRUE
Share:
23,438
oort
Author by

oort

Updated on July 09, 2022

Comments

  • oort
    oort almost 2 years

    I have a factor named SMOKE with levels "Y" and "N". Missing values were replaced with NA (from the initial level "NULL"). However when I view the factor I get something like this:

    head(SMOKE)
    # N N <NA> Y Y N
    # Levels: Y N
    

    Why is R displaying NA as <NA>? And is there a difference?

  • oort
    oort about 11 years
    Thanks for clarifying that for me
  • Arun
    Arun about 11 years
    RicardoSaporta, It's a bit unclear to me. While checking this answer from @SimonO101 I find that if you've a data.frame, ex: df <- data.frame(x=1:5, y=c("a", "b", NA_character_, "d"), stringsAsFactors=FALSE), it still stays <NA>. Of course the question is for vectors. But still, this is not clear to me. Ex: When you print the column, df$y it disappears. But when you print df, it shows the angle brackets.
  • hadley
    hadley about 10 years
    See also addNA(), e.g. levels(addNA(x))
  • hadley
    hadley about 10 years
    Also probably worth showing this: factor(c("NA", "<NA>", NA)). Looking at the printed representation of something is not a great way to understand what it is!
  • Gregor Thomas
    Gregor Thomas over 5 years
    I'm not sure what this is an exception about. The above answers essentially say when NA is in factor or character, it is printed as <NA>. This is the same as what you demonstrate using data.table.