Using rollmean when there are missing values (NA)

15,860

Solution 1

From ?rollmean

The default method of ‘rollmean’ does not handle inputs that contain ‘NA’s. In such cases, use ‘rollapply’ instead.

Solution 2

Use 'partial=TRUE' option. The option makes it possible to calculate data with NA.

> rollapply(z, width=3, FUN=function(x) mean(x, na.rm=TRUE), by=1, by.column=TRUE, partial=TRUE, fill=NA, align="right")

     a    b        c
1  0.0  NaN 1.000000
2  0.5 10.0 5.500000
3  1.0  9.5 4.333333
4  2.0  9.0 6.666667
5  3.0  8.0 4.666667
6  4.0  7.0 6.000000
7  5.0  6.0 7.000000
8  6.0  5.0 8.666667
9  7.0  4.0 8.333333
10 8.0  3.0 7.000000
11 9.0  2.0 5.000000

If you want to change 'NaN' in the first row to '0', modify 'fill=NA' to 'fill=0'.

Share:
15,860

Related videos on Youtube

Alex
Author by

Alex

Updated on June 14, 2022

Comments

  • Alex
    Alex almost 2 years

    I have a data set which has a couple of NA in it. I take a rolling mean and expect that when there is no NA in the window, the rolling mean should produce a number as opposed to NA, however, rollmeanr in zoo does not seem to do this. Example:

    require(zoo)
    z = zoo(cbind(a=0:10, b=c(NA,10:1), c=sample(1:11,11)), 1:11) 
    rollmeanr(z, k=3, fill=NA)
        a  b        c
    1  NA NA       NA
    2  NA NA       NA
    3   1 NA 3.333333
    4   2 NA 4.666667
    5   3 NA 4.000000
    6   4 NA 6.333333
    7   5 NA 7.000000
    8   6 NA 9.333333
    9   7 NA 8.333333
    10  8 NA 8.666667
    11  9 NA 5.666667
    
    rollapply(z, width=3, FUN=mean, by=1, by.column=TRUE, fill=NA, align="right")
        a  b        c
    1  NA NA       NA
    2  NA NA       NA
    3   1 NA 3.333333
    4   2  9 4.666667
    5   3  8 4.000000
    6   4  7 6.333333
    7   5  6 7.000000
    8   6  5 9.333333
    9   7  4 8.333333
    10  8  3 8.666667
    11  9  2 5.666667
    

    I would expect these two calls to generate the same result. Please comment. Some session info:

    sessionInfo()
    R version 3.0.1 (2013-05-16)
    Platform: x86_64-unknown-linux-gnu (64-bit)
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
     [7] LC_PAPER=C                 LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
     [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
     [1] zoo_1.7-10
    
    loaded via a namespace (and not attached):
     [1] grid_3.0.1      lattice_0.20-15
    
    • dickoa
      dickoa almost 11 years
      From the help file I have : The default method of ‘rollmean’ does not handle inputs that contain ‘NA’s. In such cases, use ‘rollapply’ instead.
    • Alex
      Alex almost 11 years
      Yes, I saw that. I assumed that It would just not allow you to skip over NA as rollapply allows you to pass na.rm=TRUE. Should that be read as it breaks when there are NA?
  • Alex
    Alex almost 11 years
    Yes, I saw that. I assumed that It would just not allow you to skip over NA as rollapply allows you to pass na.rm=TRUE. Should that be read as it breaks when there are NA?
  • GSee
    GSee almost 11 years
    Look at zoo:::rollmean.zoo and note that na.rm is not passed anywhere.
  • Alex
    Alex almost 11 years
    yeh, that's not what i was saying though. i thought na.rm=FALSE would be the default and you can't modify that in rollmean where as you can modify that in rollapply. That's what I understood the help file to be saying. Obviously I was incorrect.
  • George Steblovsky
    George Steblovsky almost 11 years
    You could always use 'filter' function. It has no problems with NAs and very fast
  • GSee
    GSee almost 11 years
    @GeorgeSteblovsky Yes, as.zoo(apply(z, 2, function(x) filter(x, rep(1/3, 3), sides=1))) is about 9 times faster in this case.
  • G. Grothendieck
    G. Grothendieck about 7 years
    or equivalently: rollapplyr(z, 3, mean, na.rm = TRUE, by = 1, partial = TRUE, fill = NA)
  • Ken Williams
    Ken Williams over 6 years
    @GeorgeSteblovsky while NAs are allowed using filter(), they pollute the output much more than they do in rollapply - try x <- c(5, 7, 10, NA, 3, 6, 2, NA, 1, 9); as.numeric(filter(x, rep(1/3, 3))); zoo::rollapply(x, 3, mean, na.rm=TRUE) and compare the output.
  • Nebulloyd
    Nebulloyd almost 2 years
    Is it possible to calculate the mean for cells only when the original value was NA? In other words can original values be kept while imputing averages within the given window only where the original values were NA? Similar to na.fill(x, 'extend') but with a limit to which it 'extends' being the window or width.
  • JKim
    JKim almost 2 years
    @Nebulloyd I think your question is about 'mean imputation'. statisticsglobe.com/mean-imputation-for-missing-data
  • Nebulloyd
    Nebulloyd almost 2 years
    @JKim Unfortunately no I am not. The key difference is that the mean should only be calculated over a small 'window' in the column. The examples in your link fill all NAs of a column with the same mean value (column mean). I am using a time series data set so I expect the values directly before or after to be more similar to missing NAs than the column mean. I also want long streaks of NA larger than a certain value to remain NA.