How to calculate vwap (volume weighted average price) using groupby and apply?

20,370

Solution 1

Option 0
plain vanilla approach

def vwap(df):
    q = df.quantity.values
    p = df.price.values
    return df.assign(vwap=(p * q).cumsum() / q.cumsum())

df = df.groupby(df.index.date, group_keys=False).apply(vwap)
df

                     price  quantity       vwap
time                                           
2016-06-08 09:00:22  32.30    1960.0  32.300000
2016-06-08 09:00:22  32.30     142.0  32.300000
2016-06-08 09:00:22  32.30    3857.0  32.300000
2016-06-08 09:00:22  32.30    1000.0  32.300000
2016-06-08 09:00:22  32.35     991.0  32.306233
2016-06-08 09:00:22  32.30     447.0  32.305901

Option 1
Throwing in a little eval

df = df.assign(
    vwap=df.eval(
        'wgtd = price * quantity', inplace=False
    ).groupby(df.index.date).cumsum().eval('wgtd / quantity')
)
df

                     price  quantity       vwap
time                                           
2016-06-08 09:00:22  32.30    1960.0  32.300000
2016-06-08 09:00:22  32.30     142.0  32.300000
2016-06-08 09:00:22  32.30    3857.0  32.300000
2016-06-08 09:00:22  32.30    1000.0  32.300000
2016-06-08 09:00:22  32.35     991.0  32.306233
2016-06-08 09:00:22  32.30     447.0  32.305901

Solution 2

I also used this method before but it's not working quite accurately if you're trying to limit the window period. Instead I found the TA python library to work really well: https://technical-analysis-library-in-python.readthedocs.io/en/latest/index.html

from ta.volume import VolumeWeightedAveragePrice

# ...
def vwap(dataframe, label='vwap', window=3, fillna=True):
        dataframe[label] = VolumeWeightedAveragePrice(high=dataframe['high'], low=dataframe['low'], close=dataframe["close"], volume=dataframe['volume'], window=window, fillna=fillna).volume_weighted_average_price()
        return dataframe
Share:
20,370
cJc
Author by

cJc

Updated on July 12, 2022

Comments

  • cJc
    cJc almost 2 years

    I have read multiple post similar to my question, but I still can't figure it out. I have a pandas df that looks like the following (for multiple days):

    Out[1]: 
                         price  quantity
    time                                
    2016-06-08 09:00:22  32.30    1960.0
    2016-06-08 09:00:22  32.30     142.0
    2016-06-08 09:00:22  32.30    3857.0
    2016-06-08 09:00:22  32.30    1000.0
    2016-06-08 09:00:22  32.35     991.0
    2016-06-08 09:00:22  32.30     447.0
    ...
    

    To calculate the vwap I could do:

    df['vwap'] = (np.cumsum(df.quantity * df.price) / np.cumsum(df.quantity))
    

    However, I would like to start over every day (groupby), but I can't figure out how to make it work with a (lambda?) function.

    df['vwap_day'] = df.groupby(df.index.date)['vwap'].apply(lambda ...
    

    Speed is of essence. Would appreciate any help:)