How to create a Decile and Quintile columns to rank another variable based on size using Python, Pandas?

40,380

The functionality you're looking for is in pandas.qcut http://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html

In [51]: import numpy as np

In [52]: import pandas as pd

In [53]: investment_df = pd.DataFrame(np.arange(10), columns=['investment'])

In [54]: investment_df['decile'] = pd.qcut(investment_df['investment'], 10, labels=False)

In [55]: investment_df['quintile'] = pd.qcut(investment_df['investment'], 5, labels=False)

In [56]: investment_df
Out[56]: 
   investment  decile  quintile
0           0       0         0
1           1       1         0
2           2       2         1
3           3       3         1
4           4       4         2
5           5       5         2
6           6       6         3
7           7       7         3
8           8       8         4
9           9       9         4   

It's nonstandard to label the largest percentile with the smallest number but you can do this by

In [60]: investment_df['quintile'] = pd.qcut(investment_df['investment'], 5, labels=np.arange(5, 0, -1))

In [61]: investment_df['decile'] = pd.qcut(investment_df['investment'], 10, labels=np.arange(10, 0, -1))

In [62]: investment_df
Out[62]: 
   investment decile quintile
0           0     10        5
1           1      9        5
2           2      8        4
3           3      7        4
4           4      6        3
5           5      5        3
6           6      4        2
7           7      3        2
8           8      2        1
9           9      1        1
Share:
40,380
finstats
Author by

finstats

Updated on July 09, 2022

Comments

  • finstats
    finstats almost 2 years

    I have a data frame with a column containing Investment which represents the amount invested by a trader. I would like to create 2 new columns in the data frame; one giving a decile rank and the other a quintile rank based on the Investment size. I want 1 to represent the decile with the largest Investments and 10 representing the smallest. Smilarly, I want 1 to represent the quintile with the largest investments and 5 representing the smallest.

    I am new to Pandas, so is there a way that I can easily do this? Thanks!

  • finstats
    finstats over 9 years
    Thanks @Dan the latter was what i was looking for and it worked well! I will read up more on the qcut tool, it's really handy! Thanks again :)