Pandas number rows within group in increasing order
61,747
Solution 1
Use groupby/cumcount
:
In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df
Out[25]:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3
Solution 2
Use groupby.rank function. Here the working example.
df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df
C1 C2
a 1
a 2
a 3
b 4
b 5
df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df
C1 C2 RANK
a 1 1
a 2 2
a 3 3
b 4 1
b 5 2
Related videos on Youtube
Author by
Dance Party2
Updated on March 25, 2022Comments
-
Dance Party2 about 2 years
Given the following data frame:
import pandas as pd import numpy as np df=pd.DataFrame({'A':['A','A','A','B','B','B'], 'B':['a','a','b','a','a','a'], }) df A B 0 A a 1 A a 2 A b 3 B a 4 B a 5 B a
I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:
A B C 0 A a 1 1 A a 2 2 A b 1 3 B a 1 4 B a 2 5 B a 3
I've tried this so far:
df['C']=df.groupby(['A','B'])['B'].transform('rank')
...but it doesn't work!
-
paulperry over 4 yearsthe
rank()
function as answered by @Gokulakrishnan is really better at handling the case where the grouped column values are numeric -
Steve Jorgensen over 4 yearsFollowup rhetorical question, Why does it have to be so hard to find solutions like this by reading the Pandas docs? It takes forever to figure out how to do the simplest things sometimes.
-
Kocas over 2 yearsI think this is the correct approach.
rank()
assumes the data is ordered, which may or may not be the case. -
mathtick over 2 yearsYes do not use rank unless you mean you want ordered statistics labels per group.