Pandas OHLC aggregation on OHLC data

40,078

Solution 1

This is similar to the answer you linked, but it a little cleaner, and faster, because it uses the optimized aggregations, rather than lambdas.

Note that the resample(...).agg(...) syntax requires pandas version 0.18.0.

In [101]: df.resample('1H').agg({'openbid': 'first', 
                                 'highbid': 'max', 
                                 'lowbid': 'min', 
                                 'closebid': 'last'})
Out[101]: 
                      lowbid  highbid  closebid  openbid
ctime                                                   
2015-09-30 23:00:00  1.11687  1.11712   1.11708    1.117

Solution 2

You need to use an OrderedDict to keep row order in the newer versions of pandas, like so:

import pandas as pd
from collections import OrderedDict

df['ctime'] = pd.to_datetime(df['ctime'], unit='s')
df = df.set_index('ctime')
df = df.resample('5Min').agg(
    OrderedDict([
        ('open', 'first'),
        ('high', 'max'),
        ('low', 'min'),
        ('close', 'last'),
        ('volume', 'sum'),
    ])
)

Solution 3

Given a dataframe with price and amount columns

def agg_ohlcv(x):
    arr = x['price'].values
    names = {
        'low': min(arr) if len(arr) > 0 else np.nan,
        'high': max(arr) if len(arr) > 0 else np.nan,
        'open': arr[0] if len(arr) > 0 else np.nan,
        'close': arr[-1] if len(arr) > 0 else np.nan,
        'volume': sum(x['amount'].values) if len(x['amount'].values) > 0 else 0,
    }
    return pd.Series(names)

df = df.resample('1min').apply(agg_ohlcv)
df = df.ffill()

Solution 4

Converstion from OHLC to OHLC for me worked like this:

df.resample('1H').agg({
    'openbid':'first',
    'highbid':'max',
    'lowbid':'min',
    'closebid':'last'
})
Share:
40,078

Related videos on Youtube

user3439187
Author by

user3439187

This is intentionally left blank.

Updated on July 09, 2022

Comments

  • user3439187
    user3439187 almost 2 years

    I understand that OHLC re-sampling of time series data in Pandas, using one column of data, will work perfectly, for example on the following dataframe:

    >>df
    ctime       openbid
    1443654000  1.11700
    1443654060  1.11700
    ...
    
    df['ctime']  = pd.to_datetime(df['ctime'], unit='s')
    df           = df.set_index('ctime')
    df.resample('1H',  how='ohlc', axis=0, fill_method='bfill')
    
    
    >>>
                         open     high     low       close
    ctime                                                   
    2015-09-30 23:00:00  1.11700  1.11700  1.11687   1.11697
    2015-09-30 24:00:00  1.11700  1.11712  1.11697   1.11697
    ...
    

    But what do I do if the data is already in an OHLC format? From what I can gather the OHLC method of the API calculates an OHLC slice for every column, hence if my data is in the format:

                 ctime  openbid  highbid   lowbid  closebid
    0       1443654000  1.11700  1.11700  1.11687   1.11697
    1       1443654060  1.11700  1.11712  1.11697   1.11697
    2       1443654120  1.11701  1.11708  1.11699   1.11708
    

    When I try to re-sample I get an OHLC for each of the columns, like so:

                         openbid                             highbid           \
                            open     high      low    close     open     high   
    ctime                                                                       
    2015-09-30 23:00:00  1.11700  1.11700  1.11700  1.11700  1.11700  1.11712   
    2015-09-30 23:01:00  1.11701  1.11701  1.11701  1.11701  1.11708  1.11708 
    ...
                                            lowbid                             \
                             low    close     open     high      low    close   
    ctime                                                                       
    2015-09-30 23:00:00  1.11700  1.11712  1.11687  1.11697  1.11687  1.11697   
    2015-09-30 23:01:00  1.11708  1.11708  1.11699  1.11699  1.11699  1.11699  
    ...
    
                        closebid                             
                            open     high      low    close  
    ctime                                                    
    2015-09-30 23:00:00  1.11697  1.11697  1.11697  1.11697  
    2015-09-30 23:01:00  1.11708  1.11708  1.11708  1.11708  
    

    Is there a quick(ish) workaround for this that someone is willing to share please, without me having to get knee-deep in pandas manual?

    Thanks.

    ps, there is this answer - Converting OHLC stock data into a different timeframe with python and pandas - but it was 4 years ago, so I am hoping there has been some progress.

  • user3439187
    user3439187 about 8 years
    Yeah I will go with that one; it means updating pandas, but the way my function was going means that is the preferable option. Thanks.
  • user3439187
    user3439187 about 8 years
    Any idea if I could add the fill_method='bfill' method into that solution to deal with the NAN's?
  • user3439187
    user3439187 about 8 years
    Forget that last question, the method has changed to .bfill()
  • Lionel
    Lionel almost 7 years
    Here's an alternative if you run into error with ctime index above: df = df.set_index('datetime')
  • gibbz00
    gibbz00 over 5 years
    I had to use these parameters to match it with my charting platform df.resample('1H', closed = 'right',label = 'right').agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last'})
  • not2qubit
    not2qubit over 3 years
    Please explain. What's up with the Volume len()?