R data frame rank by groups (group by rank) with package dplyr
16,013
Had a similar issue, my answer was sorting on groups and the relevant ranked variable(s) in order to then use row_number() when using group_by.
# Sample dataset
df <- data.frame(group=rep(c("GROUP 1", "GROUP 2"),10),
value=as.integer(rnorm(20, mean=1000, sd=500)))
require(dplyr)
print.data.frame(df[0:10,])
group value
1 GROUP 1 1273
2 GROUP 2 1261
3 GROUP 1 1189
4 GROUP 2 1390
5 GROUP 1 1942
6 GROUP 2 1111
7 GROUP 1 530
8 GROUP 2 893
9 GROUP 1 997
10 GROUP 2 237
sorted <- df %>%
arrange(group, -value) %>%
group_by(group) %>%
mutate(rank=row_number())
print.data.frame(sorted)
group value rank
1 GROUP 1 1942 1
2 GROUP 1 1368 2
3 GROUP 1 1273 3
4 GROUP 1 1249 4
5 GROUP 1 1189 5
6 GROUP 1 997 6
7 GROUP 1 562 7
8 GROUP 1 535 8
9 GROUP 1 530 9
10 GROUP 1 1 10
11 GROUP 2 1472 1
12 GROUP 2 1390 2
13 GROUP 2 1281 3
14 GROUP 2 1261 4
15 GROUP 2 1111 5
16 GROUP 2 893 6
17 GROUP 2 774 7
18 GROUP 2 669 8
19 GROUP 2 631 9
20 GROUP 2 237 10
Related videos on Youtube
Author by
user3628777
Updated on June 17, 2022Comments
-
user3628777 almost 2 years
I have a data frame 'test' that look like this:
session_id seller_feedback_score 1 1 282470 2 1 275258 3 1 275258 4 1 275258 5 1 37831 6 1 282470 7 1 26 8 1 138351 9 1 321350 10 1 841 11 1 138351 12 1 17263 13 1 282470 14 1 396900 15 1 282470 16 1 282470 17 1 321350 18 1 321350 19 1 321350 20 1 0 21 1 1596 22 7 282505 23 7 275283 24 7 275283 25 7 275283 26 7 37834 27 7 282505 28 7 26 29 7 138359 30 7 321360
and a code (using package dplyr) that apparently should rank the 'seller_feedback_score' within each group of session_id:
test <- test %>% group_by(session_id) %>% mutate(seller_feedback_score_rank = dense_rank(-seller_feedback_score))
however, what is really happening is that R rank the entire data frame together without relating to the groups (session_id's):
session_id seller_feedback_score seller_feedback_score_rank_2 1 1 282470 5 2 1 275258 7 3 1 275258 7 4 1 275258 7 5 1 37831 11 6 1 282470 5 7 1 26 15 8 1 138351 9 9 1 321350 3 10 1 841 14 11 1 138351 9 12 1 17263 12 13 1 282470 5 14 1 396900 1 15 1 282470 5 16 1 282470 5 17 1 321350 3 18 1 321350 3 19 1 321350 3 20 1 0 16 21 1 1596 13 22 7 282505 4 23 7 275283 6 24 7 275283 6 25 7 275283 6 26 7 37834 10 27 7 282505 4 28 7 26 15 29 7 138359 8 30 7 321360 2
I checked this by counting the unique 'seller_feedback_score_rank' values and not surprisingly it equals to the highest rank value. I'd appreciate if someone could reproduce and help. thanks
link to my original question: R group by and aggregate - return relative rank within groups using plyr
-
Rhodo over 7 years..."with package dplyr"