Date ranges in Pandas

28,822

Solution 1

freq='M' is for month-end frequencies (see here). But you can use .shift to shift it by any number of days (or any frequency for that matter):

pd.date_range(start, end, freq='M').shift(15, freq=pd.datetools.day)

Solution 2

try

date_range(start, end, freq=pd.DateOffset(months=1))

Solution 3

There actually is no "day of month" frequency (e.g. "DOMXX" like "DOM09"), but I don't see any reason not to add one.

http://github.com/pydata/pandas/issues/2289

I don't have a simple workaround for you at the moment because resample requires passing a known frequency rule. I think it should be augmented to be able to take any date range to be used as arbitrary bin edges, also. Just a matter of time and hacking...

Share:
28,822

Related videos on Youtube

knite
Author by

knite

Updated on July 25, 2022

Comments

  • knite
    knite almost 2 years

    After fighting with NumPy and dateutil for days, I recently discovered the amazing Pandas library. I've been poring through the documentation and source code, but I can't figure out how to get date_range() to generate indices at the right breakpoints.

    from datetime import date
    import pandas as pd
    
    start = date('2012-01-15')
    end = date('2012-09-20')
    # 'M' is month-end, instead I need same-day-of-month
    date_range(start, end, freq='M')
    

    What I want:

    2012-01-15
    2012-02-15
    2012-03-15
    ...
    2012-09-15
    

    What I get:

    2012-01-31
    2012-02-29
    2012-03-31
    ...
    2012-08-31
    

    I need month-sized chunks that account for the variable number of days in a month. This is possible with dateutil.rrule:

    rrule(freq=MONTHLY, dtstart=start, bymonthday=(start.day, -1), bysetpos=1)
    

    Ugly and illegible, but it works. How can do I this with pandas? I've played with both date_range() and period_range(), so far with no luck.

    My actual goal is to use groupby, crosstab and/or resample to calculate values for each period based on sums/means/etc of individual entries within the period. In other words, I want to transform data from:

                    total
    2012-01-10 00:01    50
    2012-01-15 01:01    55
    2012-03-11 00:01    60
    2012-04-28 00:01    80
    
    #Hypothetical usage
    dataframe.resample('total', how='sum', freq='M', start='2012-01-09', end='2012-04-15') 
    

    to

                    total
    2012-01-09          105 # Values summed
    2012-02-09          0   # Missing from dataframe
    2012-03-09          60
    2012-04-09          0   # Data past end date, not counted
    

    Given that Pandas originated as a financial analysis tool, I'm virtually certain that there's a simple and fast way to do this. Help appreciated!

  • knite
    knite over 11 years
    Thanks, this may be the trick I need to create a solution based on the rrule hack. However, this doesn't help with resampling on a range, as resample will still use bins aligned to the beginning of the month AFAIK.
  • knite
    knite about 8 years
    This question just hit 10K views. Perhaps it's time to revisit this functionality?
  • A. Dev
    A. Dev over 7 years
    If you are going to shift by a consistent number of days it makes more sense to use month start 'MS': pd.date_range(start, end, freq='MS').shift(15, freq=pd.datetools.day)
  • calcium3000
    calcium3000 over 6 years
    For 'freq=...' one could also use pd.DateOffset(months=1)