Date ranges in Pandas
Solution 1
freq='M'
is for month-end frequencies (see here). But you can use .shift
to shift it by any number of days (or any frequency for that matter):
pd.date_range(start, end, freq='M').shift(15, freq=pd.datetools.day)
Solution 2
try
date_range(start, end, freq=pd.DateOffset(months=1))
Solution 3
There actually is no "day of month" frequency (e.g. "DOMXX" like "DOM09"), but I don't see any reason not to add one.
http://github.com/pydata/pandas/issues/2289
I don't have a simple workaround for you at the moment because resample
requires passing a known frequency rule. I think it should be augmented to be able to take any date range to be used as arbitrary bin edges, also. Just a matter of time and hacking...
Related videos on Youtube
knite
Updated on July 25, 2022Comments
-
knite almost 2 years
After fighting with NumPy and dateutil for days, I recently discovered the amazing Pandas library. I've been poring through the documentation and source code, but I can't figure out how to get
date_range()
to generate indices at the right breakpoints.from datetime import date import pandas as pd start = date('2012-01-15') end = date('2012-09-20') # 'M' is month-end, instead I need same-day-of-month date_range(start, end, freq='M')
What I want:
2012-01-15 2012-02-15 2012-03-15 ... 2012-09-15
What I get:
2012-01-31 2012-02-29 2012-03-31 ... 2012-08-31
I need month-sized chunks that account for the variable number of days in a month. This is possible with dateutil.rrule:
rrule(freq=MONTHLY, dtstart=start, bymonthday=(start.day, -1), bysetpos=1)
Ugly and illegible, but it works. How can do I this with pandas? I've played with both
date_range()
andperiod_range()
, so far with no luck.My actual goal is to use
groupby
,crosstab
and/orresample
to calculate values for each period based on sums/means/etc of individual entries within the period. In other words, I want to transform data from:total 2012-01-10 00:01 50 2012-01-15 01:01 55 2012-03-11 00:01 60 2012-04-28 00:01 80 #Hypothetical usage dataframe.resample('total', how='sum', freq='M', start='2012-01-09', end='2012-04-15')
to
total 2012-01-09 105 # Values summed 2012-02-09 0 # Missing from dataframe 2012-03-09 60 2012-04-09 0 # Data past end date, not counted
Given that Pandas originated as a financial analysis tool, I'm virtually certain that there's a simple and fast way to do this. Help appreciated!
-
knite over 11 yearsThanks, this may be the trick I need to create a solution based on the rrule hack. However, this doesn't help with resampling on a range, as resample will still use bins aligned to the beginning of the month AFAIK.
-
knite about 8 yearsThis question just hit 10K views. Perhaps it's time to revisit this functionality?
-
A. Dev over 7 yearsIf you are going to shift by a consistent number of days it makes more sense to use month start 'MS':
pd.date_range(start, end, freq='MS').shift(15, freq=pd.datetools.day)
-
calcium3000 over 6 yearsFor 'freq=...' one could also use pd.DateOffset(months=1)