NA values using sqldf

14,222

Solution 1

Modify your query to ignore the NA values:

SELECT AVG(col1)
FROM file
WHERE col1 IS NOT \"NA\"

Solution 2

The problem is that the read.csv.sql function does not recognize the missing values, and converts them to zero, instead of NULL. This does not happen if you first load the data into a data.frame, and only then call sqldf.

d <- read.csv("my.na.txt")
sqldf("SELECT AVG(col1) FROM d") # 6.5
Share:
14,222
Roman Luštrik
Author by

Roman Luštrik

I'm an analyst with roots in veterinary medicine, biology/ecology and biostatistics. I work with data from various fields of natural (genetics, ecology, biotechnology...) and social sciences (e.g. official statistics, economy). Having fun with cloud solutions like AWS. My tool of choice is R, but I can also somewhat handle Python, HTML, CSS. Ask me about reproducible research and version control. I feed many, many cats.

Updated on June 15, 2022

Comments

  • Roman Luštrik
    Roman Luštrik almost 2 years

    If I try to get an average of c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10) using AVG from SQL, I get a value of 5.2, instead of the expected 6.5.

    # prepare data and write to file
    write.table(data.frame(col1 = c(NA, NA, 3:10)),
            "my.na.txt", row.names = FALSE)
    
    mean(c(NA, NA, 3:10), na.rm = TRUE) # 6.5
    
    my.na <- read.csv.sql("my.na.txt", sep = " ",
            sql = "SELECT AVG(col1) FROM file") # 5.2
    
    # this is identical to
    sum(3:10)/10
    
    unlink("my.na.txt") # remove file
    

    Which leads me to believe that sql(df) treats NA values as zero. Is it possible to ignore (exclude) NA values in an SQL call as it can be done using na.rm argument (in R)?

  • Roman Luštrik
    Roman Luštrik over 12 years
    Hum, I'm using SELECT AVG(col1) FROM file WHERE col1 IS NOT NULL and still get 5.2.
  • xQbert
    xQbert over 12 years
    are the values in the DB N/A or NULL? (select col1 from file where col1 is not null) [do the NA's show up?] or (Select Col1 from file where col1 <> 'NA' [do the NA's show up?]
  • Roman Luštrik
    Roman Luštrik over 12 years
    Using SELECT AVG(col1) FROM file WHERE col1 IS NOT \"NA\" works fine and dandy. The trick is that NAs are quoted, and you need to escape them. Thank you both for the tip.
  • Roman Luštrik
    Roman Luštrik over 12 years
    Loading the data into a data.frame is out of the question in my case.