Creating regular 15-minute time-series from irregular time-series

12,006

Solution 1

xts extends zoo, and zoo has extensive examples for this in its vignettes and documentation.
Here is a worked example. I think I have done that more elegantly in the past, but this is all I am coming up with now:

R> twohours <- ISOdatetime(2012,05,02,9,0,0) + seq(0:7)*15*60
R> twohours
[1] "2012-05-02 09:15:00 GMT" "2012-05-02 09:30:00 GMT" 
[3] "2012-05-02 09:45:00 GMT" "2012-05-02 10:00:00 GMT" 
[5] "2012-05-02 10:15:00 GMT" "2012-05-02 10:30:00 GMT" 
[7] "2012-05-02 10:45:00 GMT" "2012-05-02 11:00:00 GMT"
R> set.seed(42)
R> observation <- xts(1:10, order.by=twohours[1]+cumsum(runif(10)*60*10))
R> observation
                           [,1]
2012-05-02 09:24:08.883625    1
2012-05-02 09:33:31.128874    2
2012-05-02 09:36:22.812594    3
2012-05-02 09:44:41.081170    4
2012-05-02 09:51:06.128481    5
2012-05-02 09:56:17.586051    6
2012-05-02 10:03:39.539040    7
2012-05-02 10:05:00.338998    8
2012-05-02 10:11:34.534372    9
2012-05-02 10:18:37.573243   10

A two hour time grid, and some random observations leaving some cells empty and some filled.

R> to.minutes15(observation)[,4]
                           observation.Close
2012-05-02 09:24:08.883625                 1
2012-05-02 09:44:41.081170                 4
2012-05-02 09:56:17.586051                 6
2012-05-02 10:11:34.534372                 9
2012-05-02 10:18:37.573243                10

That is a 15 minutes grid aggregation but not on our time grid.

R> twoh <- xts(rep(NA,8), order.by=twohours)
R> twoh
                    [,1]
2012-05-02 09:15:00   NA
2012-05-02 09:30:00   NA
2012-05-02 09:45:00   NA
2012-05-02 10:00:00   NA
2012-05-02 10:15:00   NA
2012-05-02 10:30:00   NA
2012-05-02 10:45:00   NA
2012-05-02 11:00:00   NA

R> merge(twoh, observation)
                           twoh observation
2012-05-02 09:15:00.000000   NA          NA
2012-05-02 09:24:08.883625   NA           1
2012-05-02 09:30:00.000000   NA          NA
2012-05-02 09:33:31.128874   NA           2
2012-05-02 09:36:22.812594   NA           3
2012-05-02 09:44:41.081170   NA           4
2012-05-02 09:45:00.000000   NA          NA
2012-05-02 09:51:06.128481   NA           5
2012-05-02 09:56:17.586051   NA           6
2012-05-02 10:00:00.000000   NA          NA
2012-05-02 10:03:39.539040   NA           7
2012-05-02 10:05:00.338998   NA           8
2012-05-02 10:11:34.534372   NA           9
2012-05-02 10:15:00.000000   NA          NA
2012-05-02 10:18:37.573243   NA          10
2012-05-02 10:30:00.000000   NA          NA
2012-05-02 10:45:00.000000   NA          NA
2012-05-02 11:00:00.000000   NA          NA

New xts object, and merged object. Now use na.locf() to carry the observations forward:

R> na.locf(merge(twoh, observation)[,2])
                           observation
2012-05-02 09:15:00.000000          NA
2012-05-02 09:24:08.883625           1
2012-05-02 09:30:00.000000           1
2012-05-02 09:33:31.128874           2
2012-05-02 09:36:22.812594           3
2012-05-02 09:44:41.081170           4
2012-05-02 09:45:00.000000           4
2012-05-02 09:51:06.128481           5
2012-05-02 09:56:17.586051           6
2012-05-02 10:00:00.000000           6
2012-05-02 10:03:39.539040           7
2012-05-02 10:05:00.338998           8
2012-05-02 10:11:34.534372           9
2012-05-02 10:15:00.000000           9
2012-05-02 10:18:37.573243          10
2012-05-02 10:30:00.000000          10
2012-05-02 10:45:00.000000          10
2012-05-02 11:00:00.000000          10

And then we can merge again as an inner join on the time-grid xts twoh:

R> merge(twoh, na.locf(merge(twoh, observation)[,2]), join="inner")[,2]
                    observation
2012-05-02 09:15:00          NA
2012-05-02 09:30:00           1
2012-05-02 09:45:00           4
2012-05-02 10:00:00           6
2012-05-02 10:15:00           9
2012-05-02 10:30:00          10
2012-05-02 10:45:00          10
2012-05-02 11:00:00          10
R> 

Solution 2

Here is a data.table solution, this can be neatly done using a rolling join:

library(data.table)
library(xts)

lu <- data.table(index=as.POSIXct("2012-05-02") + (0:7)*15*60)

observation <- xts(1:10,
                   order.by=lu[1,index +cumsum(runif(10)*60*10)])

observation.dt <- as.data.table(observation)
observation.dt[lu,on="index",roll=T]
Share:
12,006
akashwani
Author by

akashwani

Updated on July 29, 2022

Comments

  • akashwani
    akashwani almost 2 years

    I have an irregular time-series (with DateTime and RainfallValue) in a csv file C:\SampleData.csv:

    
    DateTime,RainInches
    1/6/2000 11:59,0
    1/6/2000 23:59,0.01
    1/7/2000 11:59,0
    1/13/2000 23:59,0
    1/14/2000 0:00,0
    1/14/2000 23:59,0
    4/14/2000 3:07,0.01
    4/14/2000 3:12,0.03
    4/14/2000 3:19,0.01
    12/31/2001 22:44,0
    12/31/2001 22:59,0.07
    12/31/2001 23:14,0
    12/31/2001 23:29,0
    12/31/2001 23:44,0.01
    12/31/2001 23:59,0.01
    

    Note: The irregular time-steps could be 1 min, 15 min, 1 hour, etc. Also, there could be multiple observations in a desired 15-min interval.

    I am trying to create a regular 15-minute time-series from 2000-01-01 to 2001-12-31 that should look like:

    
    2000-01-01 00:15:00 0.00
    2000-01-01 00:30:00 0.00
    2000-01-01 00:45:00 0.00
    ...
    2001-12-31 23:30:00 0.01
    2001-12-31 23:45:00 0.01
    

    Note: The time-series is regular with 15-minute intervals, filling the missing data with 0. If there are more than one data point in a 15 minute intervals, they are summed.

    Here's is my code:

    
    library(zoo)
    library(xts)
    
    filename = "C:\\SampleData.csv"
    ReadData <- read.zoo(filename, format = "%m/%d/%Y %H:%M", sep=",", tz="UTC", header=TRUE) # read .csv as a ZOO object
    RawData <- aggregate(ReadData, index(ReadData), sum) # Merge duplicate time stamps and SUM the corresponding data (CAUTION)
    RawDataSeries <- as.xts(RawData,order.by =index(RawData)) #convert to an XTS object
    
    RegularTimes <- seq(as.POSIXct("2000-01-01 00:00:00", tz = "UTC"), as.POSIXct("2001-12-31 23:45:00", tz = "UTC"), by = 60*15)
    BlankTimeSeries <- xts((rep(0,length(RegularTimes))),order.by = RegularTimes)
    
    MergedTimeSeries <- merge(RawDataSeries,BlankTimeSeries)
    TS_sum15min <- period.apply(MergedTimeSeries,endpoints(MergedTimeSeries, "minutes", 15), sum, na.rm = TRUE )
    
    TS_align15min <- align.time( TS_sum15min [endpoints(TS_sum15min , "minutes", 15)], n=60*15)
    

    Problem: The output time series TS_align15min: (a) has repeating blocks of time-stamps (b) starts (mysteriously) from 1999, as:

    1999-12-31 19:15:00    0
    1999-12-31 19:30:00    0
    1999-12-31 19:45:00    0
    1999-12-31 20:00:00    0
    1999-12-31 20:15:00    0
    1999-12-31 20:30:00    0
    

    What am I doing wrong?

    Thank you for any direction!

  • akashwani
    akashwani almost 12 years
    Thank you! It looks good. Let me convert my code to follow this and get back. I have also changed my original post to include reproducible code and sample data.
  • Joshua Ulrich
    Joshua Ulrich almost 12 years
    Regarding elegance: you don't need the twoh object. You can merge observation with an "empty" xts object (xts(,twohours)), use na.locf on that, then subset with twohours. Or, in one line: na.locf(merge(xts(,twohours),observation))[twohours].
  • Dirk Eddelbuettel
    Dirk Eddelbuettel almost 12 years
    I did the subsetting that way too (using index(twoh), but ended with errors with stumped me. Good to see I was on the right trac...
  • akashwani
    akashwani almost 12 years
    @DirkEddelbuettel In the second instruction from bottom na.locf(merge(twoh, observation)[,2]) observation) I want to fill with 0 if both the parent columns have NA. I dont want to repeat the observation from the previous time-step. It is rainfall time-series.
  • Dirk Eddelbuettel
    Dirk Eddelbuettel almost 12 years
    That was merely an example. You can use aggregation function you want via zoo / xts provided you map your irregular observed data to the regular grid, and I (and Joshua) showed you how to the latter.
  • akashwani
    akashwani almost 12 years
    @DirkEddelbuettel Please read my question once more, I made some edits. I need to aggregate the data into 15-minute blocks, when the given data is randomly sampled (even with hourly intervals). The hourly observation goes to the 15-min interval it falls into. And, if there are more than 1 observations in a 15-minute interval, then we need to add them up. If there are no observations, fill with 0 (corresponding to no rainfall). Thank you!
  • akashwani
    akashwani almost 12 years
    @DirkEddelbuettel Thanks, I'll explore aggregate. As in my code above, I used period.apply() but it's giving me wrong timestamps. aggregate.zoo() is similar, but I'll try it to see if I have the same issuse. Thank you both!
  • Dirk Eddelbuettel
    Dirk Eddelbuettel almost 12 years
    Step one: Provide the time grid. Step two: Map your data to the time grid. Step three: Run the aggregation you want (sum, mean, max, last, ...) on the time grid.
  • akashwani
    akashwani almost 12 years
    I can't map my data (to regular time-grid)before aggregation since there are multiple data values within the desired 15-minute interval. I need to aggregate first and then align.time(?) the aggregated time-series. Am I right?
  • akashwani
    akashwani almost 12 years
    Thank you! rowSums on merged xts object, followed by period.sum, followed by align.time did the trick. Thanks again for answering my first question on stackoverflow.