Convert daily pandas stock data to monthly data using first trade day of the month
Solution 1
Instead of M
you can pass MS
as the resample rule:
df =pd.DataFrame( range(72), index = pd.date_range('1/1/2011', periods=72, freq='D'))
#df.resample('MS', how = 'mean') # pandas <0.18
df.resample('MS').mean() # pandas >= 0.18
Updated to use the first business day of the month respecting US Federal Holidays:
df =pd.DataFrame( range(200), index = pd.date_range('12/1/2012', periods=200, freq='D'))
from pandas.tseries.offsets import CustomBusinessMonthBegin
from pandas.tseries.holiday import USFederalHolidayCalendar
bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())
df.resample(bmth_us).mean()
if you want custom starts of the month using the min month found in the data try this. (It isn't pretty, but it should work).
month_index =df.index.to_period('M')
min_day_in_month_index = pd.to_datetime(df.set_index(new_index, append=True).reset_index(level=0).groupby(level=0)['level_0'].min())
custom_month_starts =CustomBusinessMonthBegin(calendar = min_day_in_month_index)
Pass custom_start_months
to the fist parameter of resample
Solution 2
Thank you J Bradley, your solution worked perfectly. I did have to upgrade my version of pandas from their official website though as the version installed via pip did not have CustomBusinessMonthBegin in pandas.tseries.offsets. My final code was:
#----- imports -----
import pandas as pd
from pandas.tseries.offsets import CustomBusinessMonthBegin
import pandas.io.data as web
#----- get sample data -----
df = web.get_data_yahoo('SPY', '2012-12-01', '2013-12-31')
#----- build custom calendar -----
month_index =df.index.to_period('M')
min_day_in_month_index = pd.to_datetime(df.set_index(month_index, append=True).reset_index(level=0).groupby(level=0)['Open'].min())
custom_month_starts = CustomBusinessMonthBegin(calendar = min_day_in_month_index)
#----- convert daily data to monthly data -----
ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
mthly_ohlcva = df.resample(custom_month_starts, how=ohlc_dict)
This yielded the following:
>>> mthly_ohlcva
Volume Adj Close High Low Close Open
Date
2012-12-03 2889875900 136.92 145.58 139.54 142.41 142.80
2013-01-01 2587140200 143.92 150.94 144.73 149.70 145.11
2013-02-01 2581459300 145.76 153.28 148.73 151.61 150.65
2013-03-01 2330972300 151.30 156.85 150.41 156.67 151.09
2013-04-01 2907035000 154.20 159.72 153.55 159.68 156.59
2013-05-01 2781596000 157.84 169.07 158.10 163.45 159.33
2013-06-03 3533321800 155.74 165.99 155.73 160.42 163.83
2013-07-01 2330904500 163.78 169.86 160.22 168.71 161.26
2013-08-01 2283131700 158.87 170.97 163.05 163.65 169.99
2013-09-02 2226749600 163.90 173.60 163.70 168.01 165.23
2013-10-01 2901739000 171.49 177.51 164.53 175.79 168.14
2013-11-01 1930952900 176.57 181.75 174.76 181.00 176.02
2013-12-02 2232775900 181.15 184.69 177.32 184.69 181.09
user2766344
Updated on June 06, 2022Comments
-
user2766344 almost 2 years
I have a set of calculated OHLCVA daily securities data in a pandas dataframe like this:
>>> type(data_dy) <class 'pandas.core.frame.DataFrame'> >>> data_dy Open High Low Close Volume Adj Close Date 2012-12-28 140.64 141.42 139.87 140.03 148806700 134.63 2012-12-31 139.66 142.56 139.54 142.41 243935200 136.92 2013-01-02 145.11 146.15 144.73 146.06 192059000 140.43 2013-01-03 145.99 146.37 145.34 145.73 144761800 140.11 2013-01-04 145.97 146.61 145.67 146.37 116817700 140.72 [5 rows x 6 columns]
I'm using the following dictionary and the pandas resample function to convert the dataframe to monthly data:
>>> ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'} >>> data_dy.resample('M', how=ohlc_dict, closed='right', label='right') Volume Adj Close High Low Close Open Date 2012-12-31 392741900 136.92 142.56 139.54 142.41 140.64 2013-01-31 453638500 140.72 146.61 144.73 146.37 145.11 [2 rows x 6 columns]
This does the calculations correctly, but I'd like to use the Yahoo! date convention for monthly data of using the first trading day of the period rather than the last calendar day of the period that pandas uses.
So I'd like the answer set to be:
Volume Adj Close High Low Close Open Date 2012-12-28 392741900 136.92 142.56 139.54 142.41 140.64 2013-01-02 453638500 140.72 146.61 144.73 146.37 145.11
I could do this by converting the daily data to a python list, process the data and return the data to a dataframe, but how do can this be done with pandas?
-
user2766344 over 9 yearsThank you, I wasn't aware of this option. However, this results in the month start date, not the oldest date of the month in the data. See here: >>> data_dy.resample('MS', how=ohlc_dict) Volume Adj Close High Low Close Open Date 2012-12-01 392741900 136.92 142.56 139.54 142.41 140.64 2013-01-01 453638500 140.72 146.61 144.73 146.37 145.11 [2 rows x 6 columns]
-
user2766344 over 9 yearsI also found this list of example resample options here: stackoverflow.com/questions/17001389/….
-
user2766344 over 9 yearsTrying the 'BMS' option yields: >>> data_dy.resample('BMS', how=ohlc_dict) Volume Adj Close High Low Close Open Date 2012-12-03 392741900 136.92 142.56 139.54 142.41 140.64 2013-01-01 453638500 140.72 146.61 144.73 146.37 145.11 Still not quite what I was looking for.
-
JAB over 9 yearsI am not clear on what
2012-12-28
represents? The first day of which period? -
JAB over 9 yearsThe original question seems to ask for the first trading day of the month. Is this correct? Your comment makes me think otherwise.
-
user2766344 over 9 yearsI'm sorry if that confused things. In my example I only listed truncated data, but in the truncated data set it means the first trading day on December 2012 (obviously not a real situation). I have calculated data for non-us markets, so the first trading day of the month varies between markets, depending on local holidays. It would be most flexible if the date returned is the date from the first data row in each month and use the dictionary rules for the other columns.
-
JAB over 9 yearsIf you create a timeseries index of the first dates in the data by month and pass them to the
calendar
parameter inCustomBusinessMonthBegin
this should work. -
user2766344 over 9 years@J_Bradley Thank you for your patience and expertise. I will try this solution.
-
JAB over 9 years@user2766344 you are welcome. Update to create an timeseries index using the minimum date in the index. There is likely a more elegant way to do this, but I am not aware of it.
-
TravelTrader over 4 years
AttributeError: 'Index' object has no attribute 'to_period'
- any ideas how to solve that? -
TravelTrader over 4 yearsSounded great but even with
df.index = pd.to_datetime(df.index, format='%Y-%m-%d', errors='ignore')
there's an error:TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
. Maybe someone has a solution?