Python Pandas DataFrame resample daily data to week by Mon-Sun weekly definition?
Solution 1
In case anyone else was not aware, it turns out that the weekly Anchored Offsets are based on the end date. So, just resampling 'W' (which is the same as 'W-SUN') is by default a Monday to Sunday sample. The date listed is the end date. See this old bug report wherein neither the documentation nor the API got updated.
Given that you specified label='left'
in the resample parameters, you must have realized that fact. It's also why using 'W-MON' does not have the desired effect. What is confusing is that the left bound is not actually in the interval.
So, to display the start date for the period instead of the end date, you may add a day to the index. That would mean you would do:
df_resampled.index = df_resampled.index + pd.DateOffset(days=1)
For completeness, here is your original data with another day (a Sunday) added on the beginning to show the grouping really is Monday to Sunday:
import pandas as pd
import numpy as np
dates = pd.date_range('20141228',periods=15, name='Day')
df = pd.DataFrame({'Sum1': [10000, 1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
'Sum2': [10000, 91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
'Sum3': [10000, 365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
'Sum4': [10000, 5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates);
print(df)
df_resampled = df.resample('W', how='sum', label='left')
df_resampled.index = df_resampled.index - pd.DateOffset(days=1)
print(df_resampled)
This outputs:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-28 10000 10000 10000.00 10000
2014-12-29 1667 91 365.95 5
2014-12-30 1229 75 398.97 5
2014-12-31 1360 75 285.12 1
2015-01-01 9232 254 992.17 5
2015-01-02 8866 239 1116.57 8
2015-01-03 4083 108 512.11 8
2015-01-04 3671 99 504.47 2
2015-01-05 10085 259 1190.96 10
2015-01-06 10005 395 1753.60 12
2015-01-07 8730 355 1646.25 16
2015-01-08 10056 332 1344.05 16
2015-01-09 10176 386 1582.67 6
2015-01-10 3792 96 560.95 6
2015-01-11 3518 111 736.44 3
Sum1 Sum2 Sum3 Sum4
Day
2014-12-22 10000 10000 10000.00 10000
2014-12-29 30108 941 4175.36 34
2015-01-05 56362 1934 8814.92 69
I believe that is what you wanted for Question 1.
Update
There is now a loffset
argument to resample()
that allows you to shift the label offset. So, instead of modifying the index, you simple add the loffset
argument like so:
df.resample('W', how='sum', label='left', loffset=pd.DateOffset(days=1))
Also of note how=sum
is now deprecated in favor of using .sum()
on the Resampler object that .resample()
returns. So, the fully updated call would be:
df_resampled = df.resample('W', label='left', loffset=pd.DateOffset(days=1)).sum()
Update 1.1.0
The handy loffset
argument is deprecated as of version 1.1.0. The documentation indicates the shifting should be done after the resample. In this particular case, I believe that means this is the correct code (untested):
from pandas.tseries.frequencies import to_offset
df_resampled = df.resample('W', label='left').sum()
df_resampled.index = df_resampled.index + to_offset(pd.DateOffset(days=1))
Solution 2
This might help.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1, 1000, (100, 4)), columns='Sum1 Sum2 Sum3 Sum4'.split(), index=pd.date_range('2014-12-29', periods=100, freq='D'))
def func(group):
return pd.Series({'Sum1': group.Sum1.sum(), 'Sum2': group.Sum2.sum(),
'Sum3': group.Sum3.sum(), 'Sum4': group.Sum4.sum(), 'Day': group.index[1], 'Period': '{0} - {1}'.format(group.index[0].date(), group.index[-1].date())})
df.groupby(lambda idx: idx.week).apply(func)
Out[386]:
Day Period Sum1 Sum2 Sum3 Sum4
1 2014-12-30 2014-12-29 - 2015-01-04 3559 3692 3648 4086
2 2015-01-06 2015-01-05 - 2015-01-11 2990 3658 3348 3304
3 2015-01-13 2015-01-12 - 2015-01-18 3168 3720 3518 3273
4 2015-01-20 2015-01-19 - 2015-01-25 2275 4968 4095 2366
5 2015-01-27 2015-01-26 - 2015-02-01 4146 2167 3888 4576
.. ... ... ... ... ... ...
11 2015-03-10 2015-03-09 - 2015-03-15 4035 3518 2588 2714
12 2015-03-17 2015-03-16 - 2015-03-22 3399 3901 3430 2143
13 2015-03-24 2015-03-23 - 2015-03-29 3227 3308 3185 3814
14 2015-03-31 2015-03-30 - 2015-04-05 4278 3369 3623 4167
15 2015-04-07 2015-04-06 - 2015-04-07 1466 632 1136 1392
[15 rows x 6 columns]
Solution 3
Great question.
df_resampled = df.resample('W-MON', label='left', closed='left').sum()
The parameter closed
could work for your question.
Jarad
I am many things. Programmer - Python primarily, trying to learn Javascript more Data scientist - machine learning with Python (Scikit-learn, Tensorflow, Keras, PyTorch, etc.) Entrepreneur - Build paid search software, creator of an index card sleeve (very useful next to your desk while you code), online course creator, paid advertising consultant and mentor, and so on.
Updated on June 08, 2022Comments
-
Jarad almost 2 years
import pandas as pd import numpy as np dates = pd.date_range('20141229',periods=14, name='Day') df = pd.DataFrame({'Sum1': [1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518], 'Sum2': [91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111], 'Sum3': [365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44], 'Sum4': [5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates); print(df)
The
df
produced looks like this:Sum1 Sum2 Sum3 Sum4 Day 2014-12-29 1667 91 365.95 5 2014-12-30 1229 75 398.97 5 2014-12-31 1360 75 285.12 1 2015-01-01 9232 254 992.17 5 2015-01-02 8866 239 1116.57 8 2015-01-03 4083 108 512.11 8 2015-01-04 3671 99 504.47 2 2015-01-05 10085 259 1190.96 10 2015-01-06 10005 395 1753.60 12 2015-01-07 8730 355 1646.25 16 2015-01-08 10056 332 1344.05 16 2015-01-09 10176 386 1582.67 6 2015-01-10 3792 96 560.95 6 2015-01-11 3518 111 736.44 3
Let's say I resample the
Dataframe
to try and sum the daily data into weekly rows:df_resampled = df.resample('W', how='sum', label='left'); print(df_resampled)
This produces the following:
Sum1 Sum2 Sum3 Sum4 Day 2014-12-28 30108 941 4175.36 34 2015-01-04 56362 1934 8814.92 69
Question 1: my definition of a week is Mon - Sun. Since my data starts on
2014-12-29
(a Monday), I want myDay
label
to also start on that day. How would I make theDay
index
label
be the date of every Monday instead of every Sunday?Desired Output:
Sum1 Sum2 Sum3 Sum4 Day 2014-12-29 30108 941 4175.36 34 2015-01-05 56362 1934 8814.92 69
What have I tried regarding Question 1?
I changed
'W'
to'W-MON'
but it produced 3 rows by counting2014-12-29
in2014-12-22
row which is not what I want:Sum1 Sum2 Sum3 Sum4 Day 2014-12-22 1667 91 365.95 5 2014-12-29 38526 1109 5000.37 39 2015-01-05 46277 1675 7623.96 59
Question 2: how would I format the
Day
index
label to look like a range? Ex:Sum1 Sum2 Sum3 Sum4 Day 2014-12-29 - 2015-01-04 30108 941 4175.36 34 2015-01-05 - 2015-01-11 56362 1934 8814.92 69
-
xiaotong xu about 2 yearsGreat job. And
df_resampled = df.resample('W-MON', label='left', closed='left').sum()
also works from Monday to Sunday. -
Wtower about 2 yearsKudos . This is the correct approach after v1.1.0.