R data frame rank by groups (group by rank) with package dplyr

16,013

Had a similar issue, my answer was sorting on groups and the relevant ranked variable(s) in order to then use row_number() when using group_by.

# Sample dataset
df <- data.frame(group=rep(c("GROUP 1", "GROUP 2"),10),
               value=as.integer(rnorm(20, mean=1000, sd=500)))
require(dplyr)
print.data.frame(df[0:10,])
   group value
1  GROUP 1  1273
2  GROUP 2  1261
3  GROUP 1  1189
4  GROUP 2  1390
5  GROUP 1  1942
6  GROUP 2  1111
7  GROUP 1   530
8  GROUP 2   893
9  GROUP 1   997
10 GROUP 2   237

sorted <- df %>% 
          arrange(group, -value) %>%
          group_by(group) %>%
          mutate(rank=row_number())
print.data.frame(sorted)

      group value rank
1  GROUP 1  1942    1
2  GROUP 1  1368    2
3  GROUP 1  1273    3
4  GROUP 1  1249    4
5  GROUP 1  1189    5
6  GROUP 1   997    6
7  GROUP 1   562    7
8  GROUP 1   535    8
9  GROUP 1   530    9
10 GROUP 1     1   10
11 GROUP 2  1472    1
12 GROUP 2  1390    2
13 GROUP 2  1281    3
14 GROUP 2  1261    4
15 GROUP 2  1111    5
16 GROUP 2   893    6
17 GROUP 2   774    7
18 GROUP 2   669    8
19 GROUP 2   631    9
20 GROUP 2   237   10
Share:
16,013

Related videos on Youtube

user3628777
Author by

user3628777

Updated on June 17, 2022

Comments

  • user3628777
    user3628777 almost 2 years

    I have a data frame 'test' that look like this:

        session_id  seller_feedback_score
    1   1   282470
    2   1   275258
    3   1   275258
    4   1   275258
    5   1   37831
    6   1   282470
    7   1   26
    8   1   138351
    9   1   321350
    10  1   841
    11  1   138351
    12  1   17263
    13  1   282470
    14  1   396900
    15  1   282470
    16  1   282470
    17  1   321350
    18  1   321350
    19  1   321350
    20  1   0
    21  1   1596
    22  7   282505
    23  7   275283
    24  7   275283
    25  7   275283
    26  7   37834
    27  7   282505
    28  7   26
    29  7   138359
    30  7   321360
    

    and a code (using package dplyr) that apparently should rank the 'seller_feedback_score' within each group of session_id:

     test <- test %>% group_by(session_id) %>% 
      mutate(seller_feedback_score_rank = dense_rank(-seller_feedback_score))
    

    however, what is really happening is that R rank the entire data frame together without relating to the groups (session_id's):

    session_id  seller_feedback_score   seller_feedback_score_rank_2
    1   1   282470  5
    2   1   275258  7
    3   1   275258  7
    4   1   275258  7
    5   1   37831   11
    6   1   282470  5
    7   1   26  15
    8   1   138351  9
    9   1   321350  3
    10  1   841 14
    11  1   138351  9
    12  1   17263   12
    13  1   282470  5
    14  1   396900  1
    15  1   282470  5
    16  1   282470  5
    17  1   321350  3
    18  1   321350  3
    19  1   321350  3
    20  1   0   16
    21  1   1596    13
    22  7   282505  4
    23  7   275283  6
    24  7   275283  6
    25  7   275283  6
    26  7   37834   10
    27  7   282505  4
    28  7   26  15
    29  7   138359  8
    30  7   321360  2 
    

    I checked this by counting the unique 'seller_feedback_score_rank' values and not surprisingly it equals to the highest rank value. I'd appreciate if someone could reproduce and help. thanks

    link to my original question: R group by and aggregate - return relative rank within groups using plyr

  • Rhodo
    Rhodo over 7 years
    ..."with package dplyr"