Calculating percentile for specific groups
12,921
Solution 1
I think you can use in loop not all DataFrame
df
with column price
, but group price
with column price
:
import pandas as pd
import numpy as np
np.random.seed(1)
df = pd.DataFrame(np.random.randint(10, size=(5,3)))
df.columns = ['Product Id','group','price']
print df
Product Id group price
0 5 8 9
1 5 0 0
2 1 7 6
3 9 2 4
4 5 2 4
for group, price in df.groupby(['group']):
print np.percentile(df['price'],60)
4.8
4.8
4.8
4.8
group
for group, price in df.groupby(['group']):
print np.percentile(price['price'],60)
0.0
4.0
6.0
9.0
Another solution for np.percentile
where is output Serie
:
print df.groupby(['group'])['price'].apply(lambda x: np.percentile(x,60))
group
0 0.0
2 4.0
7 6.0
8 9.0
Name: price, dtype: float64
Solution with DataFrameGroupBy.quantile
:
print df.groupby(['group'])['price'].quantile(.6)
group
0 0.0
2 4.0
7 6.0
8 9.0
Name: price, dtype: float64
EDIT by comment:
If you need new column use transform
, docs:
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.randint(10,size=(20,3)))
>>> df.columns = ['Product Id','group','price']
>>> df
Product Id group price
0 5 8 9
1 5 0 0
2 1 7 6
3 9 2 4
4 5 2 4
5 2 4 7
6 7 9 1
7 7 0 6
8 9 9 7
9 6 9 1
10 0 1 8
11 8 3 9
12 8 7 3
13 6 5 1
14 9 3 4
15 8 1 4
16 0 3 9
17 2 0 4
18 9 2 7
19 7 9 8
>>> df['percentil'] = df.groupby(['group'])['price'].transform(lambda x: x.quantile(.6))
>>> df
Product Id group price percentil
0 5 8 9 9.0
1 5 0 0 4.4
2 1 7 6 4.8
3 9 2 4 4.6
4 5 2 4 4.6
5 2 4 7 7.0
6 7 9 1 5.8
7 7 0 6 4.4
8 9 9 7 5.8
9 6 9 1 5.8
10 0 1 8 6.4
11 8 3 9 9.0
12 8 7 3 4.8
13 6 5 1 1.0
14 9 3 4 9.0
15 8 1 4 6.4
16 0 3 9 9.0
17 2 0 4 4.4
18 9 2 7 4.6
19 7 9 8 5.8
Solution 2
You could try pandas quantile
df[['group', 'price']].groupby('group').quantile(.6)
Return values at the given quantile over requested axis, a la numpy.percentile.
Author by
Anu
Updated on June 04, 2022Comments
-
Anu almost 2 years
I have 3 columns.
Product Id
,Price
,Group
(values A, B, C, D)I want to get price percentile for each group and I am running the following code.
for group, price in df.groupby(['group']): df['percentile'] = np.percentile(df['price'],60)
the column percentile has only one value 3.44 for each group. The expected values for each group were
2.12, 3.43, 3.65, 4.76. 8.99
.What is going wrong here, please let me know.
-
Anu about 8 yearsNot sure if this meets my purpose. I do not need to print the output. I want to create a column "percentile" in the same dataframe df with 60th percentile for each group. This means my df will have now 4 columns, product id, price, group and percentile. In the next step I want create another column using this new "percentile" so that I can categorize Product Ids in each "group" by its "price". My next line is df['price_point'] = np.where(df['retailprice'] >= k,'high','low')