NA values using sqldf
Solution 1
Modify your query to ignore the NA
values:
SELECT AVG(col1)
FROM file
WHERE col1 IS NOT \"NA\"
Solution 2
The problem is that the read.csv.sql
function does not recognize the missing values, and converts them to zero, instead of NULL.
This does not happen if you first load the data into a data.frame, and only then call sqldf
.
d <- read.csv("my.na.txt")
sqldf("SELECT AVG(col1) FROM d") # 6.5
Roman Luštrik
I'm an analyst with roots in veterinary medicine, biology/ecology and biostatistics. I work with data from various fields of natural (genetics, ecology, biotechnology...) and social sciences (e.g. official statistics, economy). Having fun with cloud solutions like AWS. My tool of choice is R, but I can also somewhat handle Python, HTML, CSS. Ask me about reproducible research and version control. I feed many, many cats.
Updated on June 15, 2022Comments
-
Roman Luštrik almost 2 years
If I try to get an average of
c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10)
usingAVG
from SQL, I get a value of 5.2, instead of the expected 6.5.# prepare data and write to file write.table(data.frame(col1 = c(NA, NA, 3:10)), "my.na.txt", row.names = FALSE) mean(c(NA, NA, 3:10), na.rm = TRUE) # 6.5 my.na <- read.csv.sql("my.na.txt", sep = " ", sql = "SELECT AVG(col1) FROM file") # 5.2 # this is identical to sum(3:10)/10 unlink("my.na.txt") # remove file
Which leads me to believe that sql(df) treats NA values as zero. Is it possible to ignore (exclude) NA values in an SQL call as it can be done using
na.rm
argument (in R)? -
Roman Luštrik over 12 yearsHum, I'm using
SELECT AVG(col1) FROM file WHERE col1 IS NOT NULL
and still get 5.2. -
xQbert over 12 yearsare the values in the DB N/A or NULL? (select col1 from file where col1 is not null) [do the NA's show up?] or (Select Col1 from file where col1 <> 'NA' [do the NA's show up?]
-
Roman Luštrik over 12 yearsUsing
SELECT AVG(col1) FROM file WHERE col1 IS NOT \"NA\"
works fine and dandy. The trick is that NAs are quoted, and you need to escape them. Thank you both for the tip. -
Roman Luštrik over 12 yearsLoading the data into a data.frame is out of the question in my case.