Pandas DENSE RANK
13,683
Solution 1
Use pd.Series.rank
with method='dense'
df['Rank'] = df.Year.rank(method='dense').astype(int)
df
Solution 2
The fastest solution is factorize
:
df['Rank'] = pd.factorize(df.Year)[0] + 1
Timings:
#len(df)=40k
df = pd.concat([df]*10000).reset_index(drop=True)
In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int)
1000 loops, best of 3: 1.55 ms per loop
In [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 1
1000 loops, best of 3: 1.22 ms per loop
In [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 1
1000 loops, best of 3: 737 µs per loop
Solution 3
You can convert the year to categoricals and then take their codes (adding one because they are zero indexed and you wanted the initial value to start with one per your example).
df['Rank'] = df.Year.astype('category').cat.codes + 1
>>> df
Year Value Rank
0 2012 10 1
1 2013 20 2
2 2013 25 2
3 2014 30 3
Author by
Keithx
Updated on July 15, 2022Comments
-
Keithx almost 2 years
I'm dealing with pandas dataframe and have a frame like this:
Year Value 2012 10 2013 20 2013 25 2014 30
I want to make an equialent to DENSE_RANK () over (order by year) function. to make an additional column like this:
Year Value Rank 2012 10 1 2013 20 2 2013 25 2 2014 30 3
How can it be done in pandas?
Thanks!
-
Oliver W. about 7 yearsNote that you will want to use
sort=True
in the call tofactorize
, which will impact your timings as well (in my randomly generated 3M large numerical df, method 1, i.e. using therank
method turns out to be the fastest). The reason you assumed it works, is because the array's non-duplicate elements were already sorted. -
jezrael about 7 yearsYes, but it depends if data are sort or not. In sample are sorted, so not necessary.
-
Oliver W. about 7 yearsIndeed, and that's what I said. Because it's sorted, factorize will be faster. In general, data is not sorted and so factorize and rank will return different answers. I added the comment as a warning to future readers, who would blindly take over solutions without checking the conditions under which they're assumed to work.
-
jezrael about 7 years@OliverW. - Thank you.
-
jezrael about 6 years@piRSquared - Thanks, it hapens. Your solution was upvoted by me ;)