Calculate Percentile Rank using NTILE?

19,413

Solution 1

NTILE is absolutely NOT the same as percentile rank. NTILE simply divides up a set of data evenly by the number provided (as noted by RoyiNamir above). If you chart the results of both functions, NTILE will be a perfectly linear line from 1-to-n, whereas percentile rank will [usually] have some curves to it depending on your data.

Percentile rank is much more complicated than simply dividing it up by N. It then takes each row's number and figures out where in the distribution it lies, interpolating when necessary (which is very CPU intensive). I have an Excel sheet of 525,000 rows and it dominates my 8-core machine's CPU at 100% for 15-20 minutes just to figure out the PERCENTRANK function for a single column.

Solution 2

One way to think of this is, "the percentage of Students with Scores below this one."

Here is one way to get that type of percentile in SQL Server, using RANK():

select *
    , (rank() over (order by Score) - 1.0) / (select count(*) from @temp) * 100 as PercentileRank
from @temp

Note that this will always be less than 100% unless you round up, and you will always get 0% for the lowest value(s). This does not necessarily put the median value at 50%, nor will it interpolate like some percentile calculations do.

Feel free to round or cast the whole expression (e.g. cast(... as decimal(4,2))) for good looking reports, or even replace - 1.0 with - 1e to force floating point calculation.

NTILE() isn't really what you're looking for in this case because it essentially divides the row numbers of an ordered set into groups rather than the values. It will assign a different percentile to two instances of the same value if those instances happen to straddle a crossover point. You'd have to then additionally group by that value and grab the max or min percentile of the group to use NTILE() in the same way as we're doing with RANK().

Solution 3

Is there a typo?

select NTILE(100) OVER (order by Score) PercentileRank 
from @temp

And your script looks good. If you think something wrong there, could you clarify what excactly?

Solution 4

There is an issue with your code as NTILE distribution is not uniform. If you have 213 students, the top most 13 groups would have 3 students and the latter 87 would have 2 students each. This is not what you would ideally want in a percentile distribution.

You might want to use RANK/ROWNUM and then divide to get the %ile group.

Share:
19,413
O.O
Author by

O.O

nothing special here.

Updated on June 26, 2022

Comments

  • O.O
    O.O about 2 years

    Need to calculate the percentile rank (1st - 99th percentile) for each student with a score for a single test.

    I'm a little confused by the msdn definition of NTILE, because it does not explicitly mention percentile rank. I need some sort of assurance that NTILE is the correct keyword to use for calculating percentile rank.

    declare @temp table
    (
      StudentId int,
      Score int
    )
    insert into @temp
    select 1, 20
    union
    select 2, 25
    .....
    
    select NTILE(100) OVER (order by Score) PercentileRank
    from @temp
    

    It looks correct to me, but is this the correct way to calculate percentile rank?

  • O.O
    O.O over 12 years
    The documentation never explicitly mentioned a relationship between ntile and percentile rank, so I'm a little on edge that I'm just getting a false positive. Need some sort of assurance this is the correct way.