Pandas OHLC aggregation on OHLC data
Solution 1
This is similar to the answer you linked, but it a little cleaner, and faster, because it uses the optimized aggregations, rather than lambdas.
Note that the resample(...).agg(...)
syntax requires pandas version 0.18.0
.
In [101]: df.resample('1H').agg({'openbid': 'first',
'highbid': 'max',
'lowbid': 'min',
'closebid': 'last'})
Out[101]:
lowbid highbid closebid openbid
ctime
2015-09-30 23:00:00 1.11687 1.11712 1.11708 1.117
Solution 2
You need to use an OrderedDict to keep row order in the newer versions of pandas, like so:
import pandas as pd
from collections import OrderedDict
df['ctime'] = pd.to_datetime(df['ctime'], unit='s')
df = df.set_index('ctime')
df = df.resample('5Min').agg(
OrderedDict([
('open', 'first'),
('high', 'max'),
('low', 'min'),
('close', 'last'),
('volume', 'sum'),
])
)
Solution 3
Given a dataframe with price and amount columns
def agg_ohlcv(x):
arr = x['price'].values
names = {
'low': min(arr) if len(arr) > 0 else np.nan,
'high': max(arr) if len(arr) > 0 else np.nan,
'open': arr[0] if len(arr) > 0 else np.nan,
'close': arr[-1] if len(arr) > 0 else np.nan,
'volume': sum(x['amount'].values) if len(x['amount'].values) > 0 else 0,
}
return pd.Series(names)
df = df.resample('1min').apply(agg_ohlcv)
df = df.ffill()
Solution 4
Converstion from OHLC to OHLC for me worked like this:
df.resample('1H').agg({
'openbid':'first',
'highbid':'max',
'lowbid':'min',
'closebid':'last'
})
Related videos on Youtube
Comments
-
user3439187 almost 2 years
I understand that OHLC re-sampling of time series data in Pandas, using one column of data, will work perfectly, for example on the following dataframe:
>>df ctime openbid 1443654000 1.11700 1443654060 1.11700 ... df['ctime'] = pd.to_datetime(df['ctime'], unit='s') df = df.set_index('ctime') df.resample('1H', how='ohlc', axis=0, fill_method='bfill') >>> open high low close ctime 2015-09-30 23:00:00 1.11700 1.11700 1.11687 1.11697 2015-09-30 24:00:00 1.11700 1.11712 1.11697 1.11697 ...
But what do I do if the data is already in an OHLC format? From what I can gather the OHLC method of the API calculates an OHLC slice for every column, hence if my data is in the format:
ctime openbid highbid lowbid closebid 0 1443654000 1.11700 1.11700 1.11687 1.11697 1 1443654060 1.11700 1.11712 1.11697 1.11697 2 1443654120 1.11701 1.11708 1.11699 1.11708
When I try to re-sample I get an OHLC for each of the columns, like so:
openbid highbid \ open high low close open high ctime 2015-09-30 23:00:00 1.11700 1.11700 1.11700 1.11700 1.11700 1.11712 2015-09-30 23:01:00 1.11701 1.11701 1.11701 1.11701 1.11708 1.11708 ... lowbid \ low close open high low close ctime 2015-09-30 23:00:00 1.11700 1.11712 1.11687 1.11697 1.11687 1.11697 2015-09-30 23:01:00 1.11708 1.11708 1.11699 1.11699 1.11699 1.11699 ... closebid open high low close ctime 2015-09-30 23:00:00 1.11697 1.11697 1.11697 1.11697 2015-09-30 23:01:00 1.11708 1.11708 1.11708 1.11708
Is there a quick(ish) workaround for this that someone is willing to share please, without me having to get knee-deep in pandas manual?
Thanks.
ps, there is this answer - Converting OHLC stock data into a different timeframe with python and pandas - but it was 4 years ago, so I am hoping there has been some progress.
-
user3439187 about 8 yearsYeah I will go with that one; it means updating pandas, but the way my function was going means that is the preferable option. Thanks.
-
user3439187 about 8 yearsAny idea if I could add the
fill_method='bfill'
method into that solution to deal with the NAN's? -
user3439187 about 8 yearsForget that last question, the method has changed to .bfill()
-
Lionel almost 7 yearsHere's an alternative if you run into error with ctime index above: df = df.set_index('datetime')
-
gibbz00 over 5 yearsI had to use these parameters to match it with my charting platform
df.resample('1H', closed = 'right',label = 'right').agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last'})
-
not2qubit over 3 yearsPlease explain. What's up with the Volume len()?