python split a pandas data frame by week or month and group the data based on these sp
10,017
Solution 1
Perhaps group by CostCentre first, then use Series/DataFrame resample()
?
In [72]: centers = {}
In [73]: for center, idx in df.groupby("CostCentre").groups.iteritems():
....: timediff = df.ix[idx].set_index("Date")['TimeDifference']
....: centers[center] = timediff.resample("W", how=sum)
In [77]: pd.concat(centers, names=['CostCentre'])
Out[77]:
CostCentre Date
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
Additional details:
When parse_dates
is True
for the pd.read_* functions, index_col
must also be set.
In [28]: df = pd.read_clipboard(sep=' +', parse_dates=True, index_col=0,
....: dayfirst=True)
In [30]: df.head()
Out[30]:
CostCentre TimeDifference
DateOccurred
2012-09-03 2073 28138
2012-09-03 6078 34844
2012-09-03 8273 31215
2012-09-03 8367 28160
2012-09-03 8959 32037
Since resample() requires a TimeSeries-indexed frame/series, setting the index during creation eliminates the need to set the index for each group individually. GroupBy objects also have an apply method, which is basically syntactic sugar around the "combine" step done with pd.concat() above.
In [37]: x = df.groupby("CostCentre").apply(lambda df:
....: df['TimeDifference'].resample("W", how=sum))
In [38]: x.head(12)
Out[38]:
CostCentre DateOccurred
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
Solution 2
Here's a way to take your input (as text) and group it the way you want. The key is to use a dictionary for each grouping (date, then centre).
import collections
import datetime
import functools
def delta_totals_by_date_and_centre(in_file):
# Use a defaultdict instead of a normal dict so that missing values are
# automatically created. by_date is a mapping (dict) from a tuple of (year, week)
# to another mapping (dict) from centre to total delta time.
by_date = collections.defaultdict(functools.partial(collections.defaultdict, int))
# For each line in the input...
for line in in_file:
# Parse the three fields of each line into date, int ,int.
date, centre, delta = line.split()
date = datetime.datetime.strptime(date, "%d/%m/%Y").date()
centre = int(centre)
delta = int(delta)
# Determine the year and week of the year.
year, week, weekday = date.isocalendar()
year_and_week = year, week
# Add the time delta.
by_date[year_and_week][centre] += delta
# Yield each result, in order.
for year_and_week, by_centre in sorted(by_date.items()):
for centre, delta in sorted(by_centre.items()):
yield year_and_week, centre, delta
For your sample input, it produces this output (where the first column is year-week_of_the_year
).
2012-36 0 0
2012-36 2073 141208
2012-36 6078 171481
2012-36 7042 27129
2012-36 7569 124600
2012-36 8239 82153
2012-36 8273 154517
2012-36 8367 113339
2012-36 8959 82770
2012-36 9292 128089
2012-36 9532 137491
2012-36 9705 146321
2012-36 10085 151483
2012-36 10220 87496
2012-36 14573 186
2012-37 0 89522
2012-37 2073 113024
2012-37 6078 160871
2012-37 7042 35063
2012-37 7097 30866
2012-37 8239 61744
2012-37 8273 153898
2012-37 8367 93564
2012-37 8959 116727
2012-37 9292 132628
2012-37 9532 121462
2012-37 9705 139992
2012-37 10085 111229
2012-37 10220 91245
2012-38 0 6
2012-38 2073 169599
2012-38 6078 153976
2012-38 7097 34909
2012-38 7569 152958
2012-38 8239 122693
2012-38 8273 119536
2012-38 8367 116157
2012-38 8959 75579
2012-38 9292 128340
2012-38 9532 163278
2012-38 9705 95205
2012-38 10085 94284
2012-38 10220 92318
2012-38 14573 468
2012-39 0 161
2012-39 2073 170780
2012-39 6078 122972
2012-39 7042 34953
2012-39 7097 63475
2012-39 7569 92371
2012-39 8239 194048
2012-39 8273 123332
2012-39 8367 115365
2012-39 8959 104609
2012-39 9292 131369
2012-39 9532 143933
2012-39 9705 123107
2012-39 10085 129276
2012-39 10220 124681
Author by
George Thompson
Updated on June 05, 2022Comments
-
George Thompson almost 2 years
DateOccurred CostCentre TimeDifference 03/09/2012 2073 28138 03/09/2012 6078 34844 03/09/2012 8273 31215 03/09/2012 8367 28160 03/09/2012 8959 32037 03/09/2012 9292 30118 03/09/2012 9532 34200 03/09/2012 9705 27240 03/09/2012 10085 31431 03/09/2012 10220 22555 04/09/2012 6078 41126 04/09/2012 7569 31101 04/09/2012 8273 30994 04/09/2012 8959 30064 04/09/2012 9532 34655 04/09/2012 9705 26475 04/09/2012 10085 31443 04/09/2012 10220 33970 05/09/2012 2073 28221 05/09/2012 6078 27894 05/09/2012 7569 29012 05/09/2012 8239 42208 05/09/2012 8273 31128 05/09/2012 8367 27993 05/09/2012 8959 20669 05/09/2012 9292 33070 05/09/2012 9532 8189 05/09/2012 9705 27540 05/09/2012 10085 28798 05/09/2012 10220 23164 06/09/2012 2073 28350 06/09/2012 6078 35648 06/09/2012 7042 27129 06/09/2012 7569 31546 06/09/2012 8239 39945 06/09/2012 8273 31107 06/09/2012 8367 27795 06/09/2012 9292 32974 06/09/2012 9532 30320 06/09/2012 9705 37462 06/09/2012 10085 31703 06/09/2012 10220 7807 06/09/2012 14573 186 07/09/2012 0 0 07/09/2012 0 0 07/09/2012 2073 28036 07/09/2012 6078 31969 07/09/2012 7569 32941 07/09/2012 8273 30073 07/09/2012 8367 29391 07/09/2012 9292 31927 07/09/2012 9532 30127 07/09/2012 9705 27604 07/09/2012 10085 28108 08/09/2012 2073 28463 10/09/2012 6078 31266 10/09/2012 8239 16390 10/09/2012 8273 31140 10/09/2012 8959 30858 10/09/2012 9532 30794 10/09/2012 9705 28752 11/09/2012 0 0 11/09/2012 0 0 11/09/2012 0 0 11/09/2012 0 0 11/09/2012 0 0 11/09/2012 2073 28159 11/09/2012 6078 36835 11/09/2012 8239 45354 11/09/2012 8273 30922 11/09/2012 8367 31382 11/09/2012 8959 29670 11/09/2012 9292 33582 11/09/2012 9705 29394 11/09/2012 10085 17140 12/09/2012 2073 28283 12/09/2012 6078 31139 12/09/2012 7042 35063 12/09/2012 8273 31075 12/09/2012 8367 29795 12/09/2012 9292 33496 12/09/2012 9532 31669 12/09/2012 9705 26166 12/09/2012 10085 29889 12/09/2012 10220 35656 13/09/2012 2073 28144 13/09/2012 6078 30544 13/09/2012 7097 30866 13/09/2012 8273 30772 13/09/2012 8367 32387 13/09/2012 8959 29307 13/09/2012 9292 32348 13/09/2012 9532 28137 13/09/2012 9705 28823 13/09/2012 10085 31543 13/09/2012 10220 28293 14/09/2012 0 12433 14/09/2012 0 12434 14/09/2012 0 12434 14/09/2012 0 12434 14/09/2012 0 12434 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 12433 14/09/2012 0 0 14/09/2012 0 12433 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 1720 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 384 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 0 14/09/2012 0 383 14/09/2012 2073 28438 14/09/2012 6078 27255 14/09/2012 8273 29989 14/09/2012 8959 26892 14/09/2012 9292 33202 14/09/2012 9532 30862 14/09/2012 9705 26857 14/09/2012 10085 32657 14/09/2012 10220 27296 15/09/2012 6078 3832 17/09/2012 6078 30004 17/09/2012 7569 30390 17/09/2012 8239 41421 17/09/2012 8273 26337 17/09/2012 8367 31631 17/09/2012 8959 17989 17/09/2012 9292 35703 17/09/2012 9532 36542 17/09/2012 9705 27488 17/09/2012 10085 30849 17/09/2012 10220 32575 18/09/2012 2073 28293 18/09/2012 6078 27450 18/09/2012 7569 30323 18/09/2012 8239 38481 18/09/2012 8273 31154 18/09/2012 8367 27944 18/09/2012 8959 28196 18/09/2012 9292 30844 18/09/2012 9532 33128 18/09/2012 9705 32100 19/09/2012 2073 28227 19/09/2012 6078 32243 19/09/2012 7569 29041 19/09/2012 8239 42791 19/09/2012 8273 30966 19/09/2012 8367 26420 19/09/2012 8959 29394 19/09/2012 9292 14865 19/09/2012 9532 23618 19/09/2012 10085 31614 19/09/2012 10220 8686 20/09/2012 2073 28260 20/09/2012 6078 30446 20/09/2012 7097 34909 20/09/2012 7569 30869 20/09/2012 8273 31079 20/09/2012 8367 30162 20/09/2012 9292 13104 20/09/2012 9532 36614 20/09/2012 9705 35617 20/09/2012 10085 31821 20/09/2012 10220 30055 20/09/2012 14573 468 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 3 21/09/2012 0 0 21/09/2012 0 0 21/09/2012 0 3 21/09/2012 2073 28308 21/09/2012 6078 33833 21/09/2012 7569 32335 21/09/2012 9292 33824 21/09/2012 9532 33376 21/09/2012 10220 21002 22/09/2012 2073 28402 23/09/2012 2073 28109 24/09/2012 2073 28431 24/09/2012 6078 30027 24/09/2012 7097 31914 24/09/2012 8239 35617 24/09/2012 8273 30670 24/09/2012 8367 29084 24/09/2012 8959 31023 24/09/2012 9292 34394 24/09/2012 9532 31255 24/09/2012 9705 18758 24/09/2012 10085 29290 24/09/2012 10220 33230 25/09/2012 2073 28506 25/09/2012 6078 32043 25/09/2012 7042 34953 25/09/2012 7569 30898 25/09/2012 8239 41297 25/09/2012 8273 31012 25/09/2012 8367 29645 25/09/2012 8959 29904 25/09/2012 9532 37875 25/09/2012 9705 13280 25/09/2012 10085 35023 25/09/2012 10220 31359 26/09/2012 2073 28388 26/09/2012 6078 29765 26/09/2012 7097 31561 26/09/2012 7569 29151 26/09/2012 8239 40369 26/09/2012 8367 28174 26/09/2012 8959 26554 26/09/2012 9292 32104 26/09/2012 9532 33194 26/09/2012 9705 30377 26/09/2012 10085 31503 26/09/2012 10220 28310 27/09/2012 0 0 27/09/2012 0 0 27/09/2012 0 0 27/09/2012 0 0 27/09/2012 0 0 27/09/2012 0 0 27/09/2012 0 0 27/09/2012 0 0 27/09/2012 2073 28491 27/09/2012 6078 31137 27/09/2012 8239 38403 27/09/2012 8273 31117 27/09/2012 8367 28462 27/09/2012 9292 32387 27/09/2012 9532 23023 27/09/2012 9705 32790 27/09/2012 10085 33460 27/09/2012 10220 31782 28/09/2012 0 161 28/09/2012 2073 28381 28/09/2012 7569 32322 28/09/2012 8239 38362 28/09/2012 8273 30533 28/09/2012 8959 17128 28/09/2012 9292 32484 28/09/2012 9532 18586 28/09/2012 9705 27902 29/09/2012 2073 28583
- Above is a sample of a dataframe which has a million records
- How can I slice or group it by Week or Month and sum seconds column by cost centre.?*
- I have read/tried 30 of the articles on this site which appear by doing a search for
List item pandas, python, groupby, split, dataframe, week with out success. - I am using python 2.7 and pandas 0.9.
- I've read the Time Series / Date functionality section in the pandas 0.9 tutorial but couldn't make anything work with a dataframe. I would like to use the features in there such as Business week
Expected Output
DateOccurred CostCentre TimeDifference 2012-03-11 0 500000 2012-03-11 2073 570000 2012-03-18 0 650000 2012-03-18 2073 425000 2012-03-25 0 378000 2012-04-25 2073 480000