Creating regular 15-minute time-series from irregular time-series
Solution 1
xts extends zoo, and zoo has extensive examples for this in its vignettes and documentation.
Here is a worked example. I think I have done that more elegantly in the past, but this is all I am coming up with now:
R> twohours <- ISOdatetime(2012,05,02,9,0,0) + seq(0:7)*15*60
R> twohours
[1] "2012-05-02 09:15:00 GMT" "2012-05-02 09:30:00 GMT"
[3] "2012-05-02 09:45:00 GMT" "2012-05-02 10:00:00 GMT"
[5] "2012-05-02 10:15:00 GMT" "2012-05-02 10:30:00 GMT"
[7] "2012-05-02 10:45:00 GMT" "2012-05-02 11:00:00 GMT"
R> set.seed(42)
R> observation <- xts(1:10, order.by=twohours[1]+cumsum(runif(10)*60*10))
R> observation
[,1]
2012-05-02 09:24:08.883625 1
2012-05-02 09:33:31.128874 2
2012-05-02 09:36:22.812594 3
2012-05-02 09:44:41.081170 4
2012-05-02 09:51:06.128481 5
2012-05-02 09:56:17.586051 6
2012-05-02 10:03:39.539040 7
2012-05-02 10:05:00.338998 8
2012-05-02 10:11:34.534372 9
2012-05-02 10:18:37.573243 10
A two hour time grid, and some random observations leaving some cells empty and some filled.
R> to.minutes15(observation)[,4]
observation.Close
2012-05-02 09:24:08.883625 1
2012-05-02 09:44:41.081170 4
2012-05-02 09:56:17.586051 6
2012-05-02 10:11:34.534372 9
2012-05-02 10:18:37.573243 10
That is a 15 minutes grid aggregation but not on our time grid.
R> twoh <- xts(rep(NA,8), order.by=twohours)
R> twoh
[,1]
2012-05-02 09:15:00 NA
2012-05-02 09:30:00 NA
2012-05-02 09:45:00 NA
2012-05-02 10:00:00 NA
2012-05-02 10:15:00 NA
2012-05-02 10:30:00 NA
2012-05-02 10:45:00 NA
2012-05-02 11:00:00 NA
R> merge(twoh, observation)
twoh observation
2012-05-02 09:15:00.000000 NA NA
2012-05-02 09:24:08.883625 NA 1
2012-05-02 09:30:00.000000 NA NA
2012-05-02 09:33:31.128874 NA 2
2012-05-02 09:36:22.812594 NA 3
2012-05-02 09:44:41.081170 NA 4
2012-05-02 09:45:00.000000 NA NA
2012-05-02 09:51:06.128481 NA 5
2012-05-02 09:56:17.586051 NA 6
2012-05-02 10:00:00.000000 NA NA
2012-05-02 10:03:39.539040 NA 7
2012-05-02 10:05:00.338998 NA 8
2012-05-02 10:11:34.534372 NA 9
2012-05-02 10:15:00.000000 NA NA
2012-05-02 10:18:37.573243 NA 10
2012-05-02 10:30:00.000000 NA NA
2012-05-02 10:45:00.000000 NA NA
2012-05-02 11:00:00.000000 NA NA
New xts object, and merged object. Now use na.locf()
to carry the observations
forward:
R> na.locf(merge(twoh, observation)[,2])
observation
2012-05-02 09:15:00.000000 NA
2012-05-02 09:24:08.883625 1
2012-05-02 09:30:00.000000 1
2012-05-02 09:33:31.128874 2
2012-05-02 09:36:22.812594 3
2012-05-02 09:44:41.081170 4
2012-05-02 09:45:00.000000 4
2012-05-02 09:51:06.128481 5
2012-05-02 09:56:17.586051 6
2012-05-02 10:00:00.000000 6
2012-05-02 10:03:39.539040 7
2012-05-02 10:05:00.338998 8
2012-05-02 10:11:34.534372 9
2012-05-02 10:15:00.000000 9
2012-05-02 10:18:37.573243 10
2012-05-02 10:30:00.000000 10
2012-05-02 10:45:00.000000 10
2012-05-02 11:00:00.000000 10
And then we can merge again as an inner join on the time-grid xts twoh
:
R> merge(twoh, na.locf(merge(twoh, observation)[,2]), join="inner")[,2]
observation
2012-05-02 09:15:00 NA
2012-05-02 09:30:00 1
2012-05-02 09:45:00 4
2012-05-02 10:00:00 6
2012-05-02 10:15:00 9
2012-05-02 10:30:00 10
2012-05-02 10:45:00 10
2012-05-02 11:00:00 10
R>
Solution 2
Here is a data.table solution, this can be neatly done using a rolling join:
library(data.table)
library(xts)
lu <- data.table(index=as.POSIXct("2012-05-02") + (0:7)*15*60)
observation <- xts(1:10,
order.by=lu[1,index +cumsum(runif(10)*60*10)])
observation.dt <- as.data.table(observation)
observation.dt[lu,on="index",roll=T]
akashwani
Updated on July 29, 2022Comments
-
akashwani almost 2 years
I have an irregular time-series (with DateTime and RainfallValue) in a csv file
C:\SampleData.csv
:DateTime,RainInches 1/6/2000 11:59,0 1/6/2000 23:59,0.01 1/7/2000 11:59,0 1/13/2000 23:59,0 1/14/2000 0:00,0 1/14/2000 23:59,0 4/14/2000 3:07,0.01 4/14/2000 3:12,0.03 4/14/2000 3:19,0.01 12/31/2001 22:44,0 12/31/2001 22:59,0.07 12/31/2001 23:14,0 12/31/2001 23:29,0 12/31/2001 23:44,0.01 12/31/2001 23:59,0.01
Note: The irregular time-steps could be 1 min, 15 min, 1 hour, etc. Also, there could be multiple observations in a desired 15-min interval.
I am trying to create a regular 15-minute time-series from 2000-01-01 to 2001-12-31 that should look like:
2000-01-01 00:15:00 0.00 2000-01-01 00:30:00 0.00 2000-01-01 00:45:00 0.00 ... 2001-12-31 23:30:00 0.01 2001-12-31 23:45:00 0.01
Note: The time-series is regular with 15-minute intervals, filling the missing data with 0. If there are more than one data point in a 15 minute intervals, they are summed.
Here's is my code:
library(zoo) library(xts) filename = "C:\\SampleData.csv" ReadData <- read.zoo(filename, format = "%m/%d/%Y %H:%M", sep=",", tz="UTC", header=TRUE) # read .csv as a ZOO object RawData <- aggregate(ReadData, index(ReadData), sum) # Merge duplicate time stamps and SUM the corresponding data (CAUTION) RawDataSeries <- as.xts(RawData,order.by =index(RawData)) #convert to an XTS object RegularTimes <- seq(as.POSIXct("2000-01-01 00:00:00", tz = "UTC"), as.POSIXct("2001-12-31 23:45:00", tz = "UTC"), by = 60*15) BlankTimeSeries <- xts((rep(0,length(RegularTimes))),order.by = RegularTimes) MergedTimeSeries <- merge(RawDataSeries,BlankTimeSeries) TS_sum15min <- period.apply(MergedTimeSeries,endpoints(MergedTimeSeries, "minutes", 15), sum, na.rm = TRUE ) TS_align15min <- align.time( TS_sum15min [endpoints(TS_sum15min , "minutes", 15)], n=60*15)
Problem: The output time series
TS_align15min
: (a) has repeating blocks of time-stamps (b) starts (mysteriously) from 1999, as:1999-12-31 19:15:00 0 1999-12-31 19:30:00 0 1999-12-31 19:45:00 0 1999-12-31 20:00:00 0 1999-12-31 20:15:00 0 1999-12-31 20:30:00 0
What am I doing wrong?
Thank you for any direction!
-
akashwani almost 12 yearsThank you! It looks good. Let me convert my code to follow this and get back. I have also changed my original post to include reproducible code and sample data.
-
Joshua Ulrich almost 12 yearsRegarding elegance: you don't need the
twoh
object. You can mergeobservation
with an "empty" xts object (xts(,twohours)
), usena.locf
on that, then subset withtwohours
. Or, in one line:na.locf(merge(xts(,twohours),observation))[twohours]
. -
Dirk Eddelbuettel almost 12 yearsI did the subsetting that way too (using
index(twoh)
, but ended with errors with stumped me. Good to see I was on the right trac... -
akashwani almost 12 years@DirkEddelbuettel In the second instruction from bottom na.locf(merge(twoh, observation)[,2]) observation) I want to fill with 0 if both the parent columns have NA. I dont want to repeat the observation from the previous time-step. It is rainfall time-series.
-
Dirk Eddelbuettel almost 12 yearsThat was merely an example. You can use aggregation function you want via zoo / xts provided you map your irregular observed data to the regular grid, and I (and Joshua) showed you how to the latter.
-
akashwani almost 12 years@DirkEddelbuettel Please read my question once more, I made some edits. I need to aggregate the data into 15-minute blocks, when the given data is randomly sampled (even with hourly intervals). The hourly observation goes to the 15-min interval it falls into. And, if there are more than 1 observations in a 15-minute interval, then we need to add them up. If there are no observations, fill with 0 (corresponding to no rainfall). Thank you!
-
akashwani almost 12 years@DirkEddelbuettel Thanks, I'll explore aggregate. As in my code above, I used period.apply() but it's giving me wrong timestamps. aggregate.zoo() is similar, but I'll try it to see if I have the same issuse. Thank you both!
-
Dirk Eddelbuettel almost 12 yearsStep one: Provide the time grid. Step two: Map your data to the time grid. Step three: Run the aggregation you want (sum, mean, max, last, ...) on the time grid.
-
akashwani almost 12 yearsI can't map my data (to regular time-grid)before aggregation since there are multiple data values within the desired 15-minute interval. I need to aggregate first and then align.time(?) the aggregated time-series. Am I right?
-
akashwani almost 12 yearsThank you! rowSums on merged xts object, followed by period.sum, followed by align.time did the trick. Thanks again for answering my first question on stackoverflow.