Sort data frame column by factor

73,969

Solution 1

order takes multiple arguments, and it does just what you want:

with(score, score[order(sex, y, x),])
##         x        y sex
## 3   SUSAN 6.636370   F
## 5    EMMA 6.873445   F
## 9  VIOLET 8.539329   F
## 6 LEONARD 6.082038   M
## 2     TOM 7.812380   M
## 8    MATT 8.248374   M
## 4   LARRY 8.424665   M
## 7     TIM 8.754023   M
## 1    MARK 8.956372   M

Solution 2

Here is a summary of all methods mentioned in other answers/comments (to serve future searchers). I've added a data.table way of sorting.

# Base R
do.call(rbind, by(score, score$sex, function(x) x[order(x$y),]))
with(score, score[order(sex, y, x),])
score[order(score$sex,score$x),]

# Using plyr
arrange(score, sex,y)
ddply(score, c('sex', 'y'))

# Using `data.table`
library("data.table")
score_dt <- setDT(score)

# setting a key works sorts the data.table
setkey(score_dt,sex,x)
print(score_dt)

Here is Another question that deals with the same

Solution 3

I think there must be some function like it to apply on data frames and get data frames as return

Yes there is:

library(plyr)

ddply(score, c('y', 'sex'))

Solution 4

It sounds to me like you're trying to order by score within the males and females and return a combined data frame of sorted males and sorted females.

You are right that by(score, score$sex, function(x) x[order(x$y),]) returns a list of sorted data frames, one for male and one for female. You can use do.call with the rbind function to combine these data frames into a single final data frame:

do.call(rbind, by(score, score$sex, function(x) x[order(x$y),]))
#           x         y sex
# F.5    EMMA  7.526866   F
# F.9  VIOLET  8.182407   F
# F.3   SUSAN  9.677511   F
# M.4   LARRY  6.929395   M
# M.8    MATT  7.970015   M
# M.7     TIM  8.297137   M
# M.6 LEONARD  8.845588   M
# M.2     TOM  9.035948   M
# M.1    MARK 10.082314   M
Share:
73,969

Related videos on Youtube

Matias Andina
Author by

Matias Andina

Updated on July 09, 2022

Comments

  • Matias Andina
    Matias Andina almost 2 years

    Supose I have a data frame with 3 columns (name, y, sex) where name is character, y is a numeric value and sex is a factor.

    sex<-c("M","M","F","M","F","M","M","M","F")
    x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET")
    name<-as.character(x)
    y<-rnorm(9,8,1)
    score<-data.frame(x,y,sex)
    score
         name      y     sex
    1    MARK  6.767086   M
    2     TOM  7.613928   M
    3   SUSAN  7.447405   F
    4   LARRY  8.040069   M
    5    EMMA  8.306875   F
    6 LEONARD  8.697268   M
    7     TIM 10.385221   M
    8    MATT  7.497702   M
    9  VIOLET 10.177969   F
    

    If I wanted to order it by y I would use:

    score[order(score$y),]
            x         y sex
    1    MARK  6.767086   M
    3   SUSAN  7.447405   F
    8    MATT  7.497702   M
    2     TOM  7.613928   M
    4   LARRY  8.040069   M
    5    EMMA  8.306875   F
    6 LEONARD  8.697268   M
    9  VIOLET 10.177969   F
    7     TIM 10.385221   M
    

    So far, so good... The names keep the correct score BUT how could I reorder it to have M and F levels not mixed. I need to order and at the same time keep factor levels separated.

    Finally I would like to take a step further to involve character, the example doesn't help, but what if there were tied y values and I would have to order again within factor (e.g. TIM and TOM got 8.4 and I have to assign alphabetical order).

    I was thinking about by function but it creates a list and doesn't help really. I think there must be some function like it to apply on data frames and get data frames as return.

    TO MAKE CLEAR THE POINT:

    sep<-split(score,score$sex)
    sep$M<-sep$M[order(sep$M[,2]),]
    sep$M
    x         y sex
    1    MARK  6.767086   M
    8    MATT  7.497702   M
    2     TOM  7.613928   M
    4   LARRY  8.040069   M
    6 LEONARD  8.697268   M
    7     TIM 10.385221   M
    
    sep$F<-sep$F[order(sep$F[,2]),]
    sep$F
    x         y sex
    3  SUSAN  7.447405   F
    5   EMMA  8.306875   F
    9 VIOLET 10.177969   F
    
    merged<-rbind(sep$M,sep$F)
    merged
    x         y sex
    1    MARK  6.767086   M
    8    MATT  7.497702   M
    2     TOM  7.613928   M
    4   LARRY  8.040069   M
    6 LEONARD  8.697268   M
    7     TIM 10.385221   M
    3   SUSAN  7.447405   F
    5    EMMA  8.306875   F
    9  VIOLET 10.177969   F
    

    I know how to do that if I have 2 or 3 factors. But what if I had serious levels of factors, say 20, should I write a for loop?

    • thelatemail
      thelatemail over 10 years
      Are you just wanting to order by multiple variables like: score[order(score$y,score$sex,score$x),]?
    • A5C1D2H2I1M1N2O1R2T1
      A5C1D2H2I1M1N2O1R2T1 over 10 years
      @thelatemail, sounds more like order(score$sex, score$y, score$x) perhaps instead of what you proposed.
    • thelatemail
      thelatemail over 10 years
      @AnandaMahto - probably - and you can chop that down like with(score,score[order(sex, y, x),])
    • Matthew Lundberg
      Matthew Lundberg over 10 years
      I should have read your comment @thelate (or you should have posted an answer). If you post this as an answer, I'll delete mine.
  • thelatemail
    thelatemail over 10 years
    The question would be, why use plyr for a simple order operation?
  • mnel
    mnel over 10 years
    @thelatemail, You could if you used plyr::arrange. i.e. arrange(score, sex,y).
  • Matias Andina
    Matias Andina over 10 years
    I've just learnt from a mistake a great use of arrange. If you call arrange(score,sex,y) it works like you said but if you call arrange(score,y,sex) it gives you a dataframe with the minimum value of every factor. That is terrific! (sorry I'm new to R)
  • yenats
    yenats over 4 years
    is it "plyr" or "dplyr"?