Pandas number rows within group in increasing order

python-3.x pandas pandas-groupby rank python

61,747

Solution 1

Use groupby/cumcount:

In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df
Out[25]: 
   A  B  C
0  A  a  1
1  A  a  2
2  A  b  1
3  B  a  1
4  B  a  2
5  B  a  3

Solution 2

Use groupby.rank function. Here the working example.

df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df

C1 C2
a  1
a  2
a  3
b  4
b  5

df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df

C1 C2 RANK
a  1  1
a  2  2
a  3  3
b  4  1
b  5  2

61,747

Dance Party2

Updated on March 25, 2022

Comments

Dance Party2 about 2 years

Given the following data frame:

import pandas as pd
import numpy as np
df=pd.DataFrame({'A':['A','A','A','B','B','B'],
                'B':['a','a','b','a','a','a'],
                })
df

    A   B
0   A   a 
1   A   a 
2   A   b 
3   B   a 
4   B   a 
5   B   a

I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:

    A   B   C
0   A   a   1
1   A   a   2
2   A   b   1
3   B   a   1
4   B   a   2
5   B   a   3

I've tried this so far:

df['C']=df.groupby(['A','B'])['B'].transform('rank')

...but it doesn't work!

paulperry over 4 years

the rank() function as answered by @Gokulakrishnan is really better at handling the case where the grouped column values are numeric
Steve Jorgensen over 4 years

Followup rhetorical question, Why does it have to be so hard to find solutions like this by reading the Pandas docs? It takes forever to figure out how to do the simplest things sometimes.
Kocas over 2 years

I think this is the correct approach. rank() assumes the data is ordered, which may or may not be the case.
mathtick over 2 years

Yes do not use rank unless you mean you want ordered statistics labels per group.