Sort data frame column by factor
Solution 1
order
takes multiple arguments, and it does just what you want:
with(score, score[order(sex, y, x),])
## x y sex
## 3 SUSAN 6.636370 F
## 5 EMMA 6.873445 F
## 9 VIOLET 8.539329 F
## 6 LEONARD 6.082038 M
## 2 TOM 7.812380 M
## 8 MATT 8.248374 M
## 4 LARRY 8.424665 M
## 7 TIM 8.754023 M
## 1 MARK 8.956372 M
Solution 2
Here is a summary of all methods mentioned in other answers/comments (to serve future searchers). I've added a data.table way of sorting.
# Base R
do.call(rbind, by(score, score$sex, function(x) x[order(x$y),]))
with(score, score[order(sex, y, x),])
score[order(score$sex,score$x),]
# Using plyr
arrange(score, sex,y)
ddply(score, c('sex', 'y'))
# Using `data.table`
library("data.table")
score_dt <- setDT(score)
# setting a key works sorts the data.table
setkey(score_dt,sex,x)
print(score_dt)
Here is Another question that deals with the same
Solution 3
I think there must be some function like it to apply on data frames and get data frames as return
Yes there is:
library(plyr)
ddply(score, c('y', 'sex'))
Solution 4
It sounds to me like you're trying to order by score within the males and females and return a combined data frame of sorted males and sorted females.
You are right that by(score, score$sex, function(x) x[order(x$y),])
returns a list of sorted data frames, one for male and one for female. You can use do.call
with the rbind
function to combine these data frames into a single final data frame:
do.call(rbind, by(score, score$sex, function(x) x[order(x$y),]))
# x y sex
# F.5 EMMA 7.526866 F
# F.9 VIOLET 8.182407 F
# F.3 SUSAN 9.677511 F
# M.4 LARRY 6.929395 M
# M.8 MATT 7.970015 M
# M.7 TIM 8.297137 M
# M.6 LEONARD 8.845588 M
# M.2 TOM 9.035948 M
# M.1 MARK 10.082314 M
Related videos on Youtube
Matias Andina
Updated on July 09, 2022Comments
-
Matias Andina almost 2 years
Supose I have a data frame with 3 columns (
name
,y
,sex
) wherename
is character,y
is a numeric value andsex
is a factor.sex<-c("M","M","F","M","F","M","M","M","F") x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET") name<-as.character(x) y<-rnorm(9,8,1) score<-data.frame(x,y,sex) score name y sex 1 MARK 6.767086 M 2 TOM 7.613928 M 3 SUSAN 7.447405 F 4 LARRY 8.040069 M 5 EMMA 8.306875 F 6 LEONARD 8.697268 M 7 TIM 10.385221 M 8 MATT 7.497702 M 9 VIOLET 10.177969 F
If I wanted to order it by
y
I would use:score[order(score$y),] x y sex 1 MARK 6.767086 M 3 SUSAN 7.447405 F 8 MATT 7.497702 M 2 TOM 7.613928 M 4 LARRY 8.040069 M 5 EMMA 8.306875 F 6 LEONARD 8.697268 M 9 VIOLET 10.177969 F 7 TIM 10.385221 M
So far, so good... The names keep the correct score BUT how could I reorder it to have M and F levels not mixed. I need to order and at the same time keep factor levels separated.
Finally I would like to take a step further to involve character, the example doesn't help, but what if there were tied
y
values and I would have to order again within factor (e.g. TIM and TOM got 8.4 and I have to assign alphabetical order).I was thinking about by function but it creates a list and doesn't help really. I think there must be some function like it to apply on data frames and get data frames as return.
TO MAKE CLEAR THE POINT:
sep<-split(score,score$sex) sep$M<-sep$M[order(sep$M[,2]),] sep$M x y sex 1 MARK 6.767086 M 8 MATT 7.497702 M 2 TOM 7.613928 M 4 LARRY 8.040069 M 6 LEONARD 8.697268 M 7 TIM 10.385221 M sep$F<-sep$F[order(sep$F[,2]),] sep$F x y sex 3 SUSAN 7.447405 F 5 EMMA 8.306875 F 9 VIOLET 10.177969 F merged<-rbind(sep$M,sep$F) merged x y sex 1 MARK 6.767086 M 8 MATT 7.497702 M 2 TOM 7.613928 M 4 LARRY 8.040069 M 6 LEONARD 8.697268 M 7 TIM 10.385221 M 3 SUSAN 7.447405 F 5 EMMA 8.306875 F 9 VIOLET 10.177969 F
I know how to do that if I have 2 or 3 factors. But what if I had serious levels of factors, say 20, should I write a
for
loop?-
thelatemail over 10 yearsAre you just wanting to order by multiple variables like:
score[order(score$y,score$sex,score$x),]
? -
A5C1D2H2I1M1N2O1R2T1 over 10 years@thelatemail, sounds more like
order(score$sex, score$y, score$x)
perhaps instead of what you proposed. -
thelatemail over 10 years@AnandaMahto - probably - and you can chop that down like
with(score,score[order(sex, y, x),])
-
Matthew Lundberg over 10 yearsI should have read your comment @thelate (or you should have posted an answer). If you post this as an answer, I'll delete mine.
-
-
thelatemail over 10 yearsThe question would be, why use
plyr
for a simple order operation? -
mnel over 10 years@thelatemail, You could if you used
plyr::arrange
. i.e.arrange(score, sex,y)
. -
Matias Andina over 10 yearsI've just learnt from a mistake a great use of arrange. If you call arrange(score,sex,y) it works like you said but if you call arrange(score,y,sex) it gives you a dataframe with the minimum value of every factor. That is terrific! (sorry I'm new to R)
-
yenats over 4 yearsis it "plyr" or "dplyr"?