Using rollmean when there are missing values (NA)
15,860
Solution 1
From ?rollmean
The default method of ‘rollmean’ does not handle inputs that contain ‘NA’s. In such cases, use ‘rollapply’ instead.
Solution 2
Use 'partial=TRUE' option. The option makes it possible to calculate data with NA.
> rollapply(z, width=3, FUN=function(x) mean(x, na.rm=TRUE), by=1, by.column=TRUE, partial=TRUE, fill=NA, align="right")
a b c
1 0.0 NaN 1.000000
2 0.5 10.0 5.500000
3 1.0 9.5 4.333333
4 2.0 9.0 6.666667
5 3.0 8.0 4.666667
6 4.0 7.0 6.000000
7 5.0 6.0 7.000000
8 6.0 5.0 8.666667
9 7.0 4.0 8.333333
10 8.0 3.0 7.000000
11 9.0 2.0 5.000000
If you want to change 'NaN' in the first row to '0', modify 'fill=NA' to 'fill=0'.
Related videos on Youtube
Author by
Alex
Updated on June 14, 2022Comments
-
Alex almost 2 years
I have a data set which has a couple of
NA
in it. I take a rolling mean and expect that when there is noNA
in the window, the rolling mean should produce a number as opposed toNA
, however,rollmeanr
inzoo
does not seem to do this. Example:require(zoo) z = zoo(cbind(a=0:10, b=c(NA,10:1), c=sample(1:11,11)), 1:11) rollmeanr(z, k=3, fill=NA) a b c 1 NA NA NA 2 NA NA NA 3 1 NA 3.333333 4 2 NA 4.666667 5 3 NA 4.000000 6 4 NA 6.333333 7 5 NA 7.000000 8 6 NA 9.333333 9 7 NA 8.333333 10 8 NA 8.666667 11 9 NA 5.666667 rollapply(z, width=3, FUN=mean, by=1, by.column=TRUE, fill=NA, align="right") a b c 1 NA NA NA 2 NA NA NA 3 1 NA 3.333333 4 2 9 4.666667 5 3 8 4.000000 6 4 7 6.333333 7 5 6 7.000000 8 6 5 9.333333 9 7 4 8.333333 10 8 3 8.666667 11 9 2 5.666667
I would expect these two calls to generate the same result. Please comment. Some session info:
sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] zoo_1.7-10 loaded via a namespace (and not attached): [1] grid_3.0.1 lattice_0.20-15
-
dickoa almost 11 yearsFrom the help file I have :
The default method of ‘rollmean’ does not handle inputs that contain ‘NA’s. In such cases, use ‘rollapply’ instead.
-
Alex almost 11 yearsYes, I saw that. I assumed that It would just not allow you to skip over NA as rollapply allows you to pass na.rm=TRUE. Should that be read as it breaks when there are NA?
-
-
Alex almost 11 yearsYes, I saw that. I assumed that It would just not allow you to skip over NA as rollapply allows you to pass na.rm=TRUE. Should that be read as it breaks when there are NA?
-
GSee almost 11 yearsLook at
zoo:::rollmean.zoo
and note thatna.rm
is not passed anywhere. -
Alex almost 11 yearsyeh, that's not what i was saying though. i thought
na.rm=FALSE
would be the default and you can't modify that inrollmean
where as you can modify that inrollapply
. That's what I understood the help file to be saying. Obviously I was incorrect. -
George Steblovsky almost 11 yearsYou could always use 'filter' function. It has no problems with NAs and very fast
-
GSee almost 11 years@GeorgeSteblovsky Yes,
as.zoo(apply(z, 2, function(x) filter(x, rep(1/3, 3), sides=1)))
is about 9 times faster in this case. -
G. Grothendieck about 7 yearsor equivalently:
rollapplyr(z, 3, mean, na.rm = TRUE, by = 1, partial = TRUE, fill = NA)
-
Ken Williams over 6 years@GeorgeSteblovsky while
NA
s are allowed usingfilter()
, they pollute the output much more than they do inrollapply
- tryx <- c(5, 7, 10, NA, 3, 6, 2, NA, 1, 9); as.numeric(filter(x, rep(1/3, 3))); zoo::rollapply(x, 3, mean, na.rm=TRUE)
and compare the output. -
Nebulloyd almost 2 yearsIs it possible to calculate the mean for cells only when the original value was NA? In other words can original values be kept while imputing averages within the given window only where the original values were NA? Similar to na.fill(x, 'extend') but with a limit to which it 'extends' being the window or width.
-
JKim almost 2 years@Nebulloyd I think your question is about 'mean imputation'. statisticsglobe.com/mean-imputation-for-missing-data
-
Nebulloyd almost 2 years@JKim Unfortunately no I am not. The key difference is that the mean should only be calculated over a small 'window' in the column. The examples in your link fill all NAs of a column with the same mean value (column mean). I am using a time series data set so I expect the values directly before or after to be more similar to missing NAs than the column mean. I also want long streaks of NA larger than a certain value to remain NA.