spark RDD sort by two values

20,528

You can try make an RDD of key value where key will be Tuple composed from rank and popularity and value will be name and sort by the key.

For example:

// _._1 - name

// _._2 - popularity

// _._3 - rank

var tupledRDD = myRDD.map(line => ((line._3, line._2), line._1))
.sortBy(_._1, ascending=false)
.take(10)
Share:
20,528
safat siddiqui
Author by

safat siddiqui

Updated on April 13, 2020

Comments

  • safat siddiqui
    safat siddiqui about 4 years

    I have a RDDof (name:String, popularity:Int, rank:Int). I want to sort this by rank and if rank matches then by popularity. I am doing so by two transformations.

    var result = myRDD
            .sortBy(_._2, ascending = false)
            .sortBy(_._3, ascending = false)
            .take(10)
    

    Can I do the it in one transformation?