Blend of na.omit and na.pass using aggregate?
Pass both na.action=na.pass
and na.rm=TRUE
to aggregate
. The former tells aggregate
not to delete rows where NAs exist; and the latter tells mean
to ignore them.
aggregate(cbind(var1, var2, var3) ~ name, test, mean,
na.action=na.pass, na.rm=TRUE)
Hendy
Updated on June 04, 2022Comments
-
Hendy almost 2 years
I have a data set containing product prototype test data. Not all tests were run on all lots, and not all tests were executed with the same sample sizes. To illustrate, consider this case:
> test <- data.frame(name = rep(c("A", "B", "C"), each = 4), var1 = rep(c(1:3, NA), 3), var2 = 1:12, var3 = c(rep(NA, 4), 1:8)) > test name var1 var2 var3 1 A 1 1 NA 2 A 2 2 NA 3 A 3 3 NA 4 A NA 4 NA 5 B 1 5 1 6 B 2 6 2 7 B 3 7 3 8 B NA 8 4 9 C 1 9 5 10 C 2 10 6 11 C 3 11 7 12 C NA 12 8
In the past, I've only had to deal with cases of mis-matched repetitions, which has been easy with
aggregate(cbind(var1, var2) ~ name, test, FUN = mean, na.action = na.omit)
(or the default setting). I'll get averages for each lot over three values forvar1
and over four values forvar2
.Unfortunately, this will leave me with a dataset completely missing lot
A
in this case:aggregate(cbind(var1, var2, var3) ~ name, test, FUN = mean, na.action = na.omit) name var1 var2 var3 1 B 2 6 2 2 C 2 10 6
If I use
na.pass
, however, I also don't get what I want:aggregate(cbind(var1, var2, var3) ~ name, test, FUN = mean, na.action = na.pass) name var1 var2 var3 1 A NA 2.5 NA 2 B NA 6.5 2.5 3 C NA 10.5 6.5
Now I lose the good data I had in
var1
since it contained instances ofNA
.What I'd like is:
NA
as the output ofmean()
if all unique combinations ofvarN
~name
areNA
s- Output of
mean()
if there are one or more actual values forvarN
~name
I'm guessing this is pretty simple, but I just don't know how. Do I need to use
ddply
for something like this? If so... the reason I tend to avoid it is that I end up writing really long equivalents toaggregate()
like so:ddply(test, .(name), summarise, var1 = mean(var1, na.rm = T), var2 = mean(var2, na.rm = T), var3 = mean(var3, na.rm = T))
Yeah... so the result of that apparently does what I want. I'll leave the question anyway in case there's 1) a way to do this with
aggregate()
or 2) shorter syntax forddply
. -
Hendy almost 11 yearsAwesome, and I had no idea that was possible.
-
Scott Worland about 8 years@HongOoi This worked great. Just something to note, this will replace the NAs with zeros depending on which function you choose. This is most likely not the final result you want so just be sure to follow up with something like
df[df == 0] <- NA
. If you have real zeros in your df that you don't want removed, then combine the code above withis.na(df)
-
colin over 6 yearsheads up this returns
NaN
, rather thanNA
for nameA
,var3