How to sort the data on multiple columns in apache spark scala?

13,552

Solution 1

Suppose your input RDD/DataFrame is called df.

To sort recent in descending order, Freq and Monitor both in ascending you can do:

import org.apache.spark.sql.functions._

val sorted = df.sort(desc("recent"), asc("Freq"), asc("Monitor"))

You can use df.orderBy(...) as well, it's an alias of sort().

Solution 2

csv.sortBy(r => (r.recent, r.freq)) or equivalent should do it

Share:
13,552

Related videos on Youtube

Niranjanp
Author by

Niranjanp

Updated on September 15, 2022

Comments

  • Niranjanp
    Niranjanp over 1 year

    I have data set like this which I am taking from csv file and converting it into RDD using scala.

    +-----------+-----------+----------+
    | recent    | Freq      | Monitor  |
    +-----------+-----------+----------+
    |        1  |       1234 |   199090|
    |        4  |       2553|    198613|
    |        6  |       3232 |   199090|
    |        1  |       8823 |   498831|
    |        7  |       2902 |   890000|
    |        8  |       7991 |   081097|
    |        9  |       7391 |   432370|
    |        12 |       6138 |   864981|
    |        7  |       6812 |   749821|
    +-----------+-----------+----------+
    

    How to sort the data on all columns ?

    Thanks