Calculating Percentile in Python Pandas Dataframe

10,287
df.close.apply(lambda x: stats.percentileofscore(df.close.sort_values(),x))

or

df.close.rank(pct=True)

Output:

0    1.00
1    0.75
2    0.25
3    0.50
Name: close, dtype: float64
Share:
10,287
mattblack
Author by

mattblack

Updated on June 14, 2022

Comments

  • mattblack
    mattblack almost 2 years

    I'm trying to calculate the percentile of each number within a dataframe and add it to a new column called 'percentile'.

    This is my attempt:

    import pandas as pd
    from scipy import stats
    
    data = {'symbol':'FB','date':['2012-05-18','2012-05-21','2012-05-22','2012-05-23'],'close':[38.23,34.03,31.00,32.00]}
    
    df = pd.DataFrame(data)
    
    close = df['close']
    
    for i in df:
        df['percentile'] = stats.percentileofscore(close,df['close'])
    

    The column is not being filled and results in 'NaN'. This should be fairly easy, but I'm not sure where I'm going wrong.

    Thanks in advance for the help.

  • mattblack
    mattblack almost 7 years
    very simple answer, thanks @scott-boston
  • Brad Solomon
    Brad Solomon almost 7 years
    Use .rank -- should be significantly faster
  • Mate Hegedus
    Mate Hegedus over 3 years
    .rank is 100% what you should use. That lambda function while correct will be MUCH slower