How to subset data.frame by weeks and then sum?
Solution 1
This solution is influenced by @Andrie and @Chase.
# load plyr
library(plyr)
# format weeks as per requirement (replace "00" with "52" and adjust corresponding year)
tmp <- list()
tmp$y <- format(df$date, format="%Y")
tmp$w <- format(df$date, format="%U")
tmp$y[tmp$w=="00"] <- as.character(as.numeric(tmp$y[tmp$w=="00"]) - 1)
tmp$w[tmp$w=="00"] <- "52"
df$week <- paste(tmp$y, tmp$w, sep = "-")
# get summary
df2 <- ddply(df, .(week), summarize, income=sum(income))
# include week ending date
tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week==x, "date"])[[1]])
df2$week.ending <- sapply(tmp$week.ending, as.character)
# week income week.ending
# 1 2010-52 487 2011-01-01
# 2 2011-01 387 2011-01-08
# 3 2011-02 443 2011-01-15
Solution 2
First use format
to convert your dates to week numbers, then plyr::ddply()
to calculate the summaries:
library(plyr)
df$week <- format(df$date, format="%Y-%U")
ddply(df, .(week), summarize, income=sum(income))
week income
1 2011-52 413
2 2012-01 435
3 2012-02 379
For more information on format.date
, see ?strptime
, particular the bit that defines %U
as the week number.
EDIT:
Given the modified data and requirement, one way is to divide the date by 7 to get a numeric number indicating the week. (Or more precisely, divide by the number of seconds in a week to get the number of weeks since the epoch, which is 1970-01-01 by default.
In code:
df$week <- as.Date("1970-01-01")+7*trunc(as.numeric(df$date)/(3600*24*7))
library(plyr)
ddply(df, .(week), summarize, income=sum(income))
week income
1 2010-12-23 298
2 2010-12-30 392
3 2011-01-06 294
4 2011-01-13 152
I have not checked that the week boundaries are on Sunday. You will have to check this, and insert an appropriate offset into the formula.
Solution 3
This is now simple using dplyr. Also I would suggest using cut(breaks = "week")
rather than format()
to cut the dates into weeks.
library(dplyr)
df %>% group_by(week = cut(date, "week")) %>% mutate(weekly_income = sum(income))
Solution 4
I Googled "group week days into weeks R" and came across this SO question. You mention you have multiple years, so I think we need to keep up with both the week number and also the year, so I modified the answers there as so format(date, format = "%U%y")
In use it looks like this:
library(plyr) #for aggregating
df <- transform(df, weeknum = format(date, format = "%y%U"))
ddply(df, "weeknum", summarize, suminc = sum(income))
#----
weeknum suminc
1 1152 413
2 1201 435
3 1202 379
See ?strptime
for all the format abbreviations.
Solution 5
Try rollapply
from the zoo
package:
rollapply(df$income, width=7, FUN = sum, by = 7)
# [1] 487 387 443
Or, use period.sum
from the xts
package:
period.sum(xts(df$income, order.by=df$date), which(df$wday %in% 7))
# [,1]
# 2011-01-01 487
# 2011-01-08 387
# 2011-01-15 443
Or, to get the output in the format you want:
data.frame(income = period.sum(xts(df$income, order.by=df$date),
which(df$wday %in% 7)),
week = df$week[which(df$wday %in% 7)])
# income week
# 2011-01-01 487 2011-00
# 2011-01-08 387 2011-01
# 2011-01-15 443 2011-02
Note that the first week shows as 2011-00
because that's how it is entered in your data. You could also use week = df$week[which(df$wday %in% 1)]
which would match your output.
Tony Breyal
Updated on June 23, 2022Comments
-
Tony Breyal almost 2 years
Let's say I have several years worth of data which look like the following
# load date package and set random seed library(lubridate) set.seed(42) # create data.frame of dates and income date <- seq(dmy("26-12-2010"), dmy("15-01-2011"), by = "days") df <- data.frame(date = date, wday = wday(date), wday.name = wday(date, label = TRUE, abbr = TRUE), income = round(runif(21, 0, 100)), week = format(date, format="%Y-%U"), stringsAsFactors = FALSE) # date wday wday.name income week # 1 2010-12-26 1 Sun 91 2010-52 # 2 2010-12-27 2 Mon 94 2010-52 # 3 2010-12-28 3 Tues 29 2010-52 # 4 2010-12-29 4 Wed 83 2010-52 # 5 2010-12-30 5 Thurs 64 2010-52 # 6 2010-12-31 6 Fri 52 2010-52 # 7 2011-01-01 7 Sat 74 2011-00 # 8 2011-01-02 1 Sun 13 2011-01 # 9 2011-01-03 2 Mon 66 2011-01 # 10 2011-01-04 3 Tues 71 2011-01 # 11 2011-01-05 4 Wed 46 2011-01 # 12 2011-01-06 5 Thurs 72 2011-01 # 13 2011-01-07 6 Fri 93 2011-01 # 14 2011-01-08 7 Sat 26 2011-01 # 15 2011-01-09 1 Sun 46 2011-02 # 16 2011-01-10 2 Mon 94 2011-02 # 17 2011-01-11 3 Tues 98 2011-02 # 18 2011-01-12 4 Wed 12 2011-02 # 19 2011-01-13 5 Thurs 47 2011-02 # 20 2011-01-14 6 Fri 56 2011-02 # 21 2011-01-15 7 Sat 90 2011-02
I would like to sum 'income' for each week (Sunday thru Saturday). Currently I do the following:
Weekending 2011-01-01 = sum(df$income[1:7]) = 487 Weekending 2011-01-08 = sum(df$income[8:14]) = 387 Weekending 2011-01-15 = sum(df$income[15:21]) = 443
However I would like a more robust approach which will automatically sum by week. I can't work out how to automatically subset the data into weeks. Any help would be much appreciated.
-
Tony Breyal almost 12 yearsAlmost there, please see edit in my question for an issue I spotted with this approach :)
-
Tony Breyal almost 12 yearsAlmost there, please see edit in my question for an issue I spotted with this approach :)
-
Chase almost 12 years@TonyBreyal - looks like Andrie beat me to the punch and has a more robust answer. Also, Dwin's answer in the question I linked above is along these lines as well. Cheers.
-
Tony Breyal almost 12 yearsthank you kindly. I came up with my own solution based around further reading of ?format and plyr, as you suggested. I changed the data.frame in my question to better reflect what I was after and I think that is why your results are not quite what I was looking for (after applying your code to the new data) but using your answer I was able to get the solution I needed (I have added it above). Thank you very much, I couldn't have done it without reading your approach :)
-
Tony Breyal almost 12 yearsthank you kindly for the link. I saw it before but I couldn't work out how it would apply to my requirement of allowing a week to cross the year boundary. I have the solution now though and appreciate your help :)