reordering rows in a dataframe according to the order of rows in another dataframe

12,779

I'm not willing to accept that match cannot be used. It does return a possibly non-unique result, but you didn't say anything about needing a secondary sort and if you did it could easily be added as a second argument to order. I tested this on various reduced subsets of the second dataframe including one that only had single instances of each of the variable instances.

The difference in lengths should not be an issue. Here I demonstrate with first the ordering of d2 ('dfmax', shorter) by d1 ('dfverif', longer) and then an ordering of d1 by d2:

d2[ order(match(d2$variable, d1$Variable)), ]
   variable value
11    DAFQX  2.12
12    DAFQX  2.12
13    DAFQX  2.04
8     DAHEI  2.25
9     DAHEI  2.05
10    DAHEI  2.08
14    BAARG  2.12
15    BAARG  2.56
16    BAARG  2.56
4     CBUCG  1.83
5     CBUCG  2.09
6     CBUCG  1.96
7     CBUCG  1.98
1     DALZZ  2.14
2     DALZZ  2.02
3     DALZZ  2.04
d1[ order(match(d1$Variable, d2$variable)), ]

   Variable Value
17    DALZZ     9
18    DALZZ     9
19    DALZZ     9
20    DALZZ     9
13    CBUCG     9
14    CBUCG     9
15    CBUCG     9
16    CBUCG     9
5     DAHEI     9
6     DAHEI     9
7     DAHEI     9
8     DAHEI     9
1     DAFQX     9
2     DAFQX     9
3     DAFQX     9
4     DAFQX     9
9     BAARG     9
10    BAARG     9
11    BAARG     9
12    BAARG     9
Share:
12,779
abishop
Author by

abishop

Updated on June 16, 2022

Comments

  • abishop
    abishop almost 2 years

    I am a new R user and new to StackOverflow. I will do my best to ask my question concisely and explicitly and my apologies if it is not communicated in the best way.

    I am working with two dataframes. I want to reorder the rows of one dataframe so that it is identical to the order of the rows in the second dataframe so I can add data from one to the other with their formats being the same. The column I want to reorder the rows according to is a column with character string identifiers of different observation regions.

    The first dataframe "dfverif" looks (in summary) like

    Variable Value  
    DAFQX   9   
    DAFQX   9   
    DAFQX   9   
    DAFQX   9   
    DAHEI   9   
    DAHEI   9   
    DAHEI   9   
    DAHEI   9   
    BAARG   9       
    BAARG   9       
    BAARG   9   
    BAARG   9   
    CBUCG   9   
    CBUCG   9   
    CBUCG   9   
    CBUCG   9   
    DALZZ   9   
    DALZZ   9   
    DALZZ   9   
    DALZZ   9   
    

    The second dataframe "dfmax" looks like

    variable value
    DALZZ   2.14
    DALZZ   2.02
    DALZZ   2.04
    CBUCG   1.83
    CBUCG   2.09
    CBUCG   1.96
    CBUCG   1.98
    DAHEI   2.25
    DAHEI   2.05
    DAHEI   2.08
    DAFQX   2.12
    DAFQX   2.12
    DAFQX   2.04
    BAARG   2.12
    BAARG   2.56
    BAARG   2.56
    

    I want to reorder the rows of the second dataframe in terms of the order of the rows of the character vector in the first dataframe. But, there are many duplicate strings because this is time-series data so I can't use match, and I can't delete the duplicates because they hold necessary data. Also, the second dataframe is much smaller than the first (it is maximums of the time-series data rather than raw observations). I know that limits cbind and rbind but that rbind.fill and cbindX can be used if needed, although I'm not sure they are here. In actuality these dataframes have more columns but I only included 2 here for conciseness.

    Based on the question here Order data frame rows according to a target vector that specifies the desired order

    I tried doing that code

    target <- dfverif
    idx <- sapply(target,function(x){
    which(dfmax$variable==x)
    })
    idx <- unlist(idx) ##I added this because the code gave me errors because idx is classified as a list so R couldn't do the dfmax[idx,] component
    dfmax <- dfmax[idx,]
    rownames(dfmist) <- NULL
    

    But now when I do head(dfmax) I get

    [1] V1 V2
    <0 rows> (or 0-length row.names)
    

    Which I can't make sense of, and when I do str(dfmax) I get the same ordering of character variables that it had before, nothing has changed. Am I barking up the wrong tree? Is there another way to approach this that I am not aware of? Or am I trying to execute this function improperly?

    Thank you for your time and help.

  • IRTFM
    IRTFM over 9 years
    I now see that, but the use of the order(match( , )) should still succeed, shouldn't it?
  • talat
    talat over 9 years
    My answer was in fact producing a wrong result so I deleted it. Yours is the way to go :)
  • IRTFM
    IRTFM over 9 years
    Worked fine on the test cases I made; You should post as an edit in your question the output of dput(head(dfmax)) and dput(head(dverif)).
  • abishop
    abishop over 9 years
    I had to save the dataframes and then read them in with headers and give them column names and then it worked. Thanks so much! :)