Calculating Percentile in Python Pandas Dataframe
10,287
df.close.apply(lambda x: stats.percentileofscore(df.close.sort_values(),x))
or
df.close.rank(pct=True)
Output:
0 1.00
1 0.75
2 0.25
3 0.50
Name: close, dtype: float64
Author by
mattblack
Updated on June 14, 2022Comments
-
mattblack almost 2 years
I'm trying to calculate the percentile of each number within a dataframe and add it to a new column called 'percentile'.
This is my attempt:
import pandas as pd from scipy import stats data = {'symbol':'FB','date':['2012-05-18','2012-05-21','2012-05-22','2012-05-23'],'close':[38.23,34.03,31.00,32.00]} df = pd.DataFrame(data) close = df['close'] for i in df: df['percentile'] = stats.percentileofscore(close,df['close'])
The column is not being filled and results in 'NaN'. This should be fairly easy, but I'm not sure where I'm going wrong.
Thanks in advance for the help.
-
Max Power almost 7 yearsno need for looping through
for i in df
. see this answer stackoverflow.com/a/44607827/1870832 -
danche almost 7 yearsYou should know broadcast in Pandas. see this broadcast.
-
-
mattblack almost 7 yearsvery simple answer, thanks @scott-boston
-
Brad Solomon almost 7 yearsUse
.rank
-- should be significantly faster -
Mate Hegedus over 3 years
.rank
is 100% what you should use. That lambda function while correct will be MUCH slower