Using rollmean when there are missing values (NA)

r xts zoo

15,860

Solution 1

From ?rollmean

The default method of ‘rollmean’ does not handle inputs that contain ‘NA’s. In such cases, use ‘rollapply’ instead.

Solution 2

Use 'partial=TRUE' option. The option makes it possible to calculate data with NA.

> rollapply(z, width=3, FUN=function(x) mean(x, na.rm=TRUE), by=1, by.column=TRUE, partial=TRUE, fill=NA, align="right")

     a    b        c
1  0.0  NaN 1.000000
2  0.5 10.0 5.500000
3  1.0  9.5 4.333333
4  2.0  9.0 6.666667
5  3.0  8.0 4.666667
6  4.0  7.0 6.000000
7  5.0  6.0 7.000000
8  6.0  5.0 8.666667
9  7.0  4.0 8.333333
10 8.0  3.0 7.000000
11 9.0  2.0 5.000000

If you want to change 'NaN' in the first row to '0', modify 'fill=NA' to 'fill=0'.

15,860

Alex

Updated on June 14, 2022

Comments

Alex almost 2 years

I have a data set which has a couple of NA in it. I take a rolling mean and expect that when there is no NA in the window, the rolling mean should produce a number as opposed to NA, however, rollmeanr in zoo does not seem to do this. Example:

require(zoo)
z = zoo(cbind(a=0:10, b=c(NA,10:1), c=sample(1:11,11)), 1:11) 
rollmeanr(z, k=3, fill=NA)
    a  b        c
1  NA NA       NA
2  NA NA       NA
3   1 NA 3.333333
4   2 NA 4.666667
5   3 NA 4.000000
6   4 NA 6.333333
7   5 NA 7.000000
8   6 NA 9.333333
9   7 NA 8.333333
10  8 NA 8.666667
11  9 NA 5.666667

rollapply(z, width=3, FUN=mean, by=1, by.column=TRUE, fill=NA, align="right")
    a  b        c
1  NA NA       NA
2  NA NA       NA
3   1 NA 3.333333
4   2  9 4.666667
5   3  8 4.000000
6   4  7 6.333333
7   5  6 7.000000
8   6  5 9.333333
9   7  4 8.333333
10  8  3 8.666667
11  9  2 5.666667

I would expect these two calls to generate the same result. Please comment. Some session info:

sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] zoo_1.7-10

loaded via a namespace (and not attached):
 [1] grid_3.0.1      lattice_0.20-15

dickoa almost 11 years

From the help file I have : The default method of ‘rollmean’ does not handle inputs that contain ‘NA’s. In such cases, use ‘rollapply’ instead.
Alex almost 11 years

Yes, I saw that. I assumed that It would just not allow you to skip over NA as rollapply allows you to pass na.rm=TRUE. Should that be read as it breaks when there are NA?

Alex almost 11 years

Yes, I saw that. I assumed that It would just not allow you to skip over NA as rollapply allows you to pass na.rm=TRUE. Should that be read as it breaks when there are NA?
GSee almost 11 years

Look at zoo:::rollmean.zoo and note that na.rm is not passed anywhere.
Alex almost 11 years

yeh, that's not what i was saying though. i thought na.rm=FALSE would be the default and you can't modify that in rollmean where as you can modify that in rollapply. That's what I understood the help file to be saying. Obviously I was incorrect.
George Steblovsky almost 11 years

You could always use 'filter' function. It has no problems with NAs and very fast
GSee almost 11 years

@GeorgeSteblovsky Yes, as.zoo(apply(z, 2, function(x) filter(x, rep(1/3, 3), sides=1))) is about 9 times faster in this case.
G. Grothendieck about 7 years

or equivalently: rollapplyr(z, 3, mean, na.rm = TRUE, by = 1, partial = TRUE, fill = NA)
Ken Williams over 6 years

@GeorgeSteblovsky while NAs are allowed using filter(), they pollute the output much more than they do in rollapply - try x <- c(5, 7, 10, NA, 3, 6, 2, NA, 1, 9); as.numeric(filter(x, rep(1/3, 3))); zoo::rollapply(x, 3, mean, na.rm=TRUE) and compare the output.
Nebulloyd almost 2 years

Is it possible to calculate the mean for cells only when the original value was NA? In other words can original values be kept while imputing averages within the given window only where the original values were NA? Similar to na.fill(x, 'extend') but with a limit to which it 'extends' being the window or width.
JKim almost 2 years

@Nebulloyd I think your question is about 'mean imputation'. statisticsglobe.com/mean-imputation-for-missing-data
Nebulloyd almost 2 years

@JKim Unfortunately no I am not. The key difference is that the mean should only be calculated over a small 'window' in the column. The examples in your link fill all NAs of a column with the same mean value (column mean). I am using a time series data set so I expect the values directly before or after to be more similar to missing NAs than the column mean. I also want long streaks of NA larger than a certain value to remain NA.