How can I rank observations within groups in Stata?

29,594

Solution 1

I'd say this question is posed the wrong way round for best understanding. The aim is to group observations, those with the lowest value all being assigned a grade 1, the next lowest being all assigned 2 and so forth. This isn't ranking in most senses that I have seen discussed, but Stata's egen, rank() does get you part of the way.

But the direct way, which was mentioned in the Statalist thread cited elewhere in this thread (start here) is simpler in spirit than any solution quoted:

bysort group_id (var_to_rank): gen desired_rank = sum(var_to_rank != var_to_rank[_n-1]) 

Once data are sorted on var_to_rank then when values differ from previous values at the start of each block of distinct values a value of 1 is the result of var_to_rank != var_to_rank[_n-1]; otherwise 0 is the result. Summing those 1s and 0s cumulatively gives the desired variable. The prefix command bysort does the sorting required and ensures that this is all done separately within the groups defined by group_id. No need for egen at all (a command that many people who only use Stata occasionally often find bizarre).

Declaration of interest: The Statalist thread cited shows that when asked a similar question I too did not think of this solution in one.

Solution 2

The following works for me:

bysort group_id: egen desired_rank=rank(var_to_rank)

enter image description here

Solution 3

Stumbled upon such solution on the Statalist:

bysort group_id (var_to_rank) : gen rank = var_to_rank != var_to_rank[_n-1]
by group_id : replace rank = sum(rank)

Seems to sort out this issue.

Solution 4

@radek: you surely got it sorted out in the meantime ... but this would have been an easy (though not very elegant) solution:

bysort group_id:   egen desired_rank_HELP =rank(var_to_rank), field
egen desired_rank      =group(grup_id desired_rank_HELP)
drop desired_rank_HELP
Share:
29,594
radek
Author by

radek

Working with data, mainly in R, mainly on (spatial) population and health issues. Currently @ School of Earth and Environmental Sciences, The University of Queensland. Publications: Google Scholar

Updated on June 08, 2020

Comments

  • radek
    radek about 4 years

    I have some data in Stata which look like the first two columns of:

    group_id   var_to_rank  desired_rank
    ____________________________________
    
    1           10          1
    1           20          2
    1           30          3
    1           40          4
    2           10          1
    2           20          2
    2           20          2
    2           30          3
    

    I'd like to create a rank of each observation within group (group_id) according to one variable (var_to_rank). Usually, for this purpose I used:

    gen id = _n
    

    However some of my observations (group_id = 2 in my small example) have the same values of ranking variable and this approach doesn't work.

    I have also tried using:

    egen rank
    

    command with different options, but cannot make my rank variables make to look like desired_rank.

    Could you point me to a solution to this problem?

  • radek
    radek about 13 years
    Thanks chl. I tried it as well. Would it be possible however to get 1, 2, 3 ranks instead of 1, 2.5, 4?
  • chl
    chl about 13 years
    @radek Sure: bysort group_id: egen desired_rank=rank(var_to_rank), unique will give 1 2 3 4 for group 2, and replacing unique by track will give you 1 2 2 4.
  • radek
    radek about 13 years
    As you said 'unique' option gives me '1,2,3,4' and my goal is to have '1,2,2,3'.
  • chl
    chl about 13 years
    @radek Does egen desired_rank=group(var_to_rank) produce what you're looking for? (but I would not call this ranking.)
  • radek
    radek about 13 years
    It does indeed. However, it works for the whole dataset only since I cannot use it with bysort :/
  • radek
    radek over 11 years
    Thanks. Didn't think about using egen group this way.
  • radek
    radek over 11 years
    Thanks a lot. Excellent oneliner. I too wasn't sure about the exact title of the question, but decided to go with 'rank' against of 'group' since the order was important here. Feel free to adapt the question and/or title if you have better suggestion.
  • Nick Cox
    Nick Cox almost 11 years
    Sure, that works for the simple numerical example given, but it is manifestly not a solution in general, so this misses the point.
  • Nick Cox
    Nick Cox almost 11 years
    (In fact, for the example given, the int() is redundant.)
  • Nick Cox
    Nick Cox over 10 years
    Sure, that works for the simple numerical example given, but it is not a solution in general, so this misses the point. Change the first value to 5, for example, and then the first rank for the second group will no longer be 1. Problem is ranking within groups.