python split a pandas data frame by week or month and group the data based on these sp

10,017

Solution 1

Perhaps group by CostCentre first, then use Series/DataFrame resample()?

In [72]: centers = {}

In [73]: for center, idx in df.groupby("CostCentre").groups.iteritems():
   ....:     timediff = df.ix[idx].set_index("Date")['TimeDifference']
   ....:     centers[center] = timediff.resample("W", how=sum)

In [77]: pd.concat(centers, names=['CostCentre'])
Out[77]: 
CostCentre  Date      
0           2012-09-09         0
            2012-09-16     89522
            2012-09-23         6
            2012-09-30       161
2073        2012-09-09    141208
            2012-09-16    113024
            2012-09-23    169599
            2012-09-30    170780
6078        2012-09-09    171481
            2012-09-16    160871
            2012-09-23    153976
            2012-09-30    122972

Additional details:

When parse_dates is True for the pd.read_* functions, index_col must also be set.

In [28]: df = pd.read_clipboard(sep=' +', parse_dates=True, index_col=0,
   ....:                        dayfirst=True)

In [30]: df.head()
Out[30]: 
              CostCentre  TimeDifference
DateOccurred                            
2012-09-03          2073           28138
2012-09-03          6078           34844
2012-09-03          8273           31215
2012-09-03          8367           28160
2012-09-03          8959           32037

Since resample() requires a TimeSeries-indexed frame/series, setting the index during creation eliminates the need to set the index for each group individually. GroupBy objects also have an apply method, which is basically syntactic sugar around the "combine" step done with pd.concat() above.

In [37]: x = df.groupby("CostCentre").apply(lambda df: 
   ....:         df['TimeDifference'].resample("W", how=sum))

In [38]: x.head(12)
Out[38]: 
CostCentre  DateOccurred
0           2012-09-09           0
            2012-09-16       89522
            2012-09-23           6
            2012-09-30         161
2073        2012-09-09      141208
            2012-09-16      113024
            2012-09-23      169599
            2012-09-30      170780
6078        2012-09-09      171481
            2012-09-16      160871
            2012-09-23      153976
            2012-09-30      122972

Solution 2

Here's a way to take your input (as text) and group it the way you want. The key is to use a dictionary for each grouping (date, then centre).

import collections
import datetime
import functools

def delta_totals_by_date_and_centre(in_file):
    # Use a defaultdict instead of a normal dict so that missing values are
    # automatically created. by_date is a mapping (dict) from a tuple of (year, week)
    # to another mapping (dict) from centre to total delta time.
    by_date = collections.defaultdict(functools.partial(collections.defaultdict, int))

    # For each line in the input...
    for line in in_file:
        # Parse the three fields of each line into date, int ,int.
        date, centre, delta = line.split()
        date = datetime.datetime.strptime(date, "%d/%m/%Y").date()
        centre = int(centre)
        delta = int(delta)

        # Determine the year and week of the year.
        year, week, weekday = date.isocalendar()
        year_and_week = year, week

        # Add the time delta.
        by_date[year_and_week][centre] += delta

    # Yield each result, in order.
    for year_and_week, by_centre in sorted(by_date.items()):
        for centre, delta in sorted(by_centre.items()):
            yield year_and_week, centre, delta

For your sample input, it produces this output (where the first column is year-week_of_the_year).

2012-36     0      0
2012-36  2073 141208
2012-36  6078 171481
2012-36  7042  27129
2012-36  7569 124600
2012-36  8239  82153
2012-36  8273 154517
2012-36  8367 113339
2012-36  8959  82770
2012-36  9292 128089
2012-36  9532 137491
2012-36  9705 146321
2012-36 10085 151483
2012-36 10220  87496
2012-36 14573    186
2012-37     0  89522
2012-37  2073 113024
2012-37  6078 160871
2012-37  7042  35063
2012-37  7097  30866
2012-37  8239  61744
2012-37  8273 153898
2012-37  8367  93564
2012-37  8959 116727
2012-37  9292 132628
2012-37  9532 121462
2012-37  9705 139992
2012-37 10085 111229
2012-37 10220  91245
2012-38     0      6
2012-38  2073 169599
2012-38  6078 153976
2012-38  7097  34909
2012-38  7569 152958
2012-38  8239 122693
2012-38  8273 119536
2012-38  8367 116157
2012-38  8959  75579
2012-38  9292 128340
2012-38  9532 163278
2012-38  9705  95205
2012-38 10085  94284
2012-38 10220  92318
2012-38 14573    468
2012-39     0    161
2012-39  2073 170780
2012-39  6078 122972
2012-39  7042  34953
2012-39  7097  63475
2012-39  7569  92371
2012-39  8239 194048
2012-39  8273 123332
2012-39  8367 115365
2012-39  8959 104609
2012-39  9292 131369
2012-39  9532 143933
2012-39  9705 123107
2012-39 10085 129276
2012-39 10220 124681
Share:
10,017
George Thompson
Author by

George Thompson

Updated on June 05, 2022

Comments

  • George Thompson
    George Thompson almost 2 years
    DateOccurred    CostCentre  TimeDifference
    03/09/2012  2073    28138
    03/09/2012  6078    34844
    03/09/2012  8273    31215
    03/09/2012  8367    28160
    03/09/2012  8959    32037
    03/09/2012  9292    30118
    03/09/2012  9532    34200
    03/09/2012  9705    27240
    03/09/2012  10085   31431
    03/09/2012  10220   22555
    04/09/2012  6078    41126
    04/09/2012  7569    31101
    04/09/2012  8273    30994
    04/09/2012  8959    30064
    04/09/2012  9532    34655
    04/09/2012  9705    26475
    04/09/2012  10085   31443
    04/09/2012  10220   33970
    05/09/2012  2073    28221
    05/09/2012  6078    27894
    05/09/2012  7569    29012
    05/09/2012  8239    42208
    05/09/2012  8273    31128
    05/09/2012  8367    27993
    05/09/2012  8959    20669
    05/09/2012  9292    33070
    05/09/2012  9532    8189
    05/09/2012  9705    27540
    05/09/2012  10085   28798
    05/09/2012  10220   23164
    06/09/2012  2073    28350
    06/09/2012  6078    35648
    06/09/2012  7042    27129
    06/09/2012  7569    31546
    06/09/2012  8239    39945
    06/09/2012  8273    31107
    06/09/2012  8367    27795
    06/09/2012  9292    32974
    06/09/2012  9532    30320
    06/09/2012  9705    37462
    06/09/2012  10085   31703
    06/09/2012  10220   7807
    06/09/2012  14573   186
    07/09/2012  0   0
    07/09/2012  0   0
    07/09/2012  2073    28036
    07/09/2012  6078    31969
    07/09/2012  7569    32941
    07/09/2012  8273    30073
    07/09/2012  8367    29391
    07/09/2012  9292    31927
    07/09/2012  9532    30127
    07/09/2012  9705    27604
    07/09/2012  10085   28108
    08/09/2012  2073    28463
    10/09/2012  6078    31266
    10/09/2012  8239    16390
    10/09/2012  8273    31140
    10/09/2012  8959    30858
    10/09/2012  9532    30794
    10/09/2012  9705    28752
    11/09/2012  0   0
    11/09/2012  0   0
    11/09/2012  0   0
    11/09/2012  0   0
    11/09/2012  0   0
    11/09/2012  2073    28159
    11/09/2012  6078    36835
    11/09/2012  8239    45354
    11/09/2012  8273    30922
    11/09/2012  8367    31382
    11/09/2012  8959    29670
    11/09/2012  9292    33582
    11/09/2012  9705    29394
    11/09/2012  10085   17140
    12/09/2012  2073    28283
    12/09/2012  6078    31139
    12/09/2012  7042    35063
    12/09/2012  8273    31075
    12/09/2012  8367    29795
    12/09/2012  9292    33496
    12/09/2012  9532    31669
    12/09/2012  9705    26166
    12/09/2012  10085   29889
    12/09/2012  10220   35656
    13/09/2012  2073    28144
    13/09/2012  6078    30544
    13/09/2012  7097    30866
    13/09/2012  8273    30772
    13/09/2012  8367    32387
    13/09/2012  8959    29307
    13/09/2012  9292    32348
    13/09/2012  9532    28137
    13/09/2012  9705    28823
    13/09/2012  10085   31543
    13/09/2012  10220   28293
    14/09/2012  0   12433
    14/09/2012  0   12434
    14/09/2012  0   12434
    14/09/2012  0   12434
    14/09/2012  0   12434
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   12433
    14/09/2012  0   0
    14/09/2012  0   12433
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   1720
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   384
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   0
    14/09/2012  0   383
    14/09/2012  2073    28438
    14/09/2012  6078    27255
    14/09/2012  8273    29989
    14/09/2012  8959    26892
    14/09/2012  9292    33202
    14/09/2012  9532    30862
    14/09/2012  9705    26857
    14/09/2012  10085   32657
    14/09/2012  10220   27296
    15/09/2012  6078    3832
    17/09/2012  6078    30004
    17/09/2012  7569    30390
    17/09/2012  8239    41421
    17/09/2012  8273    26337
    17/09/2012  8367    31631
    17/09/2012  8959    17989
    17/09/2012  9292    35703
    17/09/2012  9532    36542
    17/09/2012  9705    27488
    17/09/2012  10085   30849
    17/09/2012  10220   32575
    18/09/2012  2073    28293
    18/09/2012  6078    27450
    18/09/2012  7569    30323
    18/09/2012  8239    38481
    18/09/2012  8273    31154
    18/09/2012  8367    27944
    18/09/2012  8959    28196
    18/09/2012  9292    30844
    18/09/2012  9532    33128
    18/09/2012  9705    32100
    19/09/2012  2073    28227
    19/09/2012  6078    32243
    19/09/2012  7569    29041
    19/09/2012  8239    42791
    19/09/2012  8273    30966
    19/09/2012  8367    26420
    19/09/2012  8959    29394
    19/09/2012  9292    14865
    19/09/2012  9532    23618
    19/09/2012  10085   31614
    19/09/2012  10220   8686
    20/09/2012  2073    28260
    20/09/2012  6078    30446
    20/09/2012  7097    34909
    20/09/2012  7569    30869
    20/09/2012  8273    31079
    20/09/2012  8367    30162
    20/09/2012  9292    13104
    20/09/2012  9532    36614
    20/09/2012  9705    35617
    20/09/2012  10085   31821
    20/09/2012  10220   30055
    20/09/2012  14573   468
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   3
    21/09/2012  0   0
    21/09/2012  0   0
    21/09/2012  0   3
    21/09/2012  2073    28308
    21/09/2012  6078    33833
    21/09/2012  7569    32335
    21/09/2012  9292    33824
    21/09/2012  9532    33376
    21/09/2012  10220   21002
    22/09/2012  2073    28402
    23/09/2012  2073    28109
    24/09/2012  2073    28431
    24/09/2012  6078    30027
    24/09/2012  7097    31914
    24/09/2012  8239    35617
    24/09/2012  8273    30670
    24/09/2012  8367    29084
    24/09/2012  8959    31023
    24/09/2012  9292    34394
    24/09/2012  9532    31255
    24/09/2012  9705    18758
    24/09/2012  10085   29290
    24/09/2012  10220   33230
    25/09/2012  2073    28506
    25/09/2012  6078    32043
    25/09/2012  7042    34953
    25/09/2012  7569    30898
    25/09/2012  8239    41297
    25/09/2012  8273    31012
    25/09/2012  8367    29645
    25/09/2012  8959    29904
    25/09/2012  9532    37875
    25/09/2012  9705    13280
    25/09/2012  10085   35023
    25/09/2012  10220   31359
    26/09/2012  2073    28388
    26/09/2012  6078    29765
    26/09/2012  7097    31561
    26/09/2012  7569    29151
    26/09/2012  8239    40369
    26/09/2012  8367    28174
    26/09/2012  8959    26554
    26/09/2012  9292    32104
    26/09/2012  9532    33194
    26/09/2012  9705    30377
    26/09/2012  10085   31503
    26/09/2012  10220   28310
    27/09/2012  0   0
    27/09/2012  0   0
    27/09/2012  0   0
    27/09/2012  0   0
    27/09/2012  0   0
    27/09/2012  0   0
    27/09/2012  0   0
    27/09/2012  0   0
    27/09/2012  2073    28491
    27/09/2012  6078    31137
    27/09/2012  8239    38403
    27/09/2012  8273    31117
    27/09/2012  8367    28462
    27/09/2012  9292    32387
    27/09/2012  9532    23023
    27/09/2012  9705    32790
    27/09/2012  10085   33460
    27/09/2012  10220   31782
    28/09/2012  0   161
    28/09/2012  2073    28381
    28/09/2012  7569    32322
    28/09/2012  8239    38362
    28/09/2012  8273    30533
    28/09/2012  8959    17128
    28/09/2012  9292    32484
    28/09/2012  9532    18586
    28/09/2012  9705    27902
    29/09/2012  2073    28583
    
    1. Above is a sample of a dataframe which has a million records
    2. How can I slice or group it by Week or Month and sum seconds column by cost centre.?*
    3. I have read/tried 30 of the articles on this site which appear by doing a search for
      List item pandas, python, groupby, split, dataframe, week with out success.
    4. I am using python 2.7 and pandas 0.9.
    5. I've read the Time Series / Date functionality section in the pandas 0.9 tutorial but couldn't make anything work with a dataframe. I would like to use the features in there such as Business week

    Expected Output

    DateOccurred CostCentre TimeDifference
    2012-03-11            0         500000
    2012-03-11         2073         570000
    2012-03-18            0         650000
    2012-03-18         2073         425000 
    2012-03-25            0         378000
    2012-04-25         2073         480000