Pandas number rows within group in increasing order

61,747

Solution 1

Use groupby/cumcount:

In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df
Out[25]: 
   A  B  C
0  A  a  1
1  A  a  2
2  A  b  1
3  B  a  1
4  B  a  2
5  B  a  3

Solution 2

Use groupby.rank function. Here the working example.

df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df

C1 C2
a  1
a  2
a  3
b  4
b  5

df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df

C1 C2 RANK
a  1  1
a  2  2
a  3  3
b  4  1
b  5  2

Share:
61,747

Related videos on Youtube

Dance Party2
Author by

Dance Party2

Updated on March 25, 2022

Comments

  • Dance Party2
    Dance Party2 about 2 years

    Given the following data frame:

    import pandas as pd
    import numpy as np
    df=pd.DataFrame({'A':['A','A','A','B','B','B'],
                    'B':['a','a','b','a','a','a'],
                    })
    df
    
        A   B
    0   A   a 
    1   A   a 
    2   A   b 
    3   B   a 
    4   B   a 
    5   B   a
    

    I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:

        A   B   C
    0   A   a   1
    1   A   a   2
    2   A   b   1
    3   B   a   1
    4   B   a   2
    5   B   a   3
    

    I've tried this so far:

    df['C']=df.groupby(['A','B'])['B'].transform('rank')
    

    ...but it doesn't work!

  • paulperry
    paulperry over 4 years
    the rank() function as answered by @Gokulakrishnan is really better at handling the case where the grouped column values are numeric
  • Steve Jorgensen
    Steve Jorgensen over 4 years
    Followup rhetorical question, Why does it have to be so hard to find solutions like this by reading the Pandas docs? It takes forever to figure out how to do the simplest things sometimes.
  • Kocas
    Kocas over 2 years
    I think this is the correct approach. rank() assumes the data is ordered, which may or may not be the case.
  • mathtick
    mathtick over 2 years
    Yes do not use rank unless you mean you want ordered statistics labels per group.