How to sort the data on multiple columns in apache spark scala?
13,552
Solution 1
Suppose your input RDD/DataFrame is called df.
To sort recent
in descending order, Freq
and Monitor
both in ascending you can do:
import org.apache.spark.sql.functions._
val sorted = df.sort(desc("recent"), asc("Freq"), asc("Monitor"))
You can use df.orderBy(...)
as well, it's an alias of sort()
.
Solution 2
csv.sortBy(r => (r.recent, r.freq))
or equivalent should do it
Related videos on Youtube
Author by
Niranjanp
Updated on September 15, 2022Comments
-
Niranjanp over 1 year
I have data set like this which I am taking from csv file and converting it into RDD using scala.
+-----------+-----------+----------+ | recent | Freq | Monitor | +-----------+-----------+----------+ | 1 | 1234 | 199090| | 4 | 2553| 198613| | 6 | 3232 | 199090| | 1 | 8823 | 498831| | 7 | 2902 | 890000| | 8 | 7991 | 081097| | 9 | 7391 | 432370| | 12 | 6138 | 864981| | 7 | 6812 | 749821| +-----------+-----------+----------+
How to sort the data on all columns ?
Thanks
-
Tzach ZoharPossible duplicate of Sorting by multiple fields in Apache Spark
-