Sort Spark Dataframe with two columns in different order

18,895

Solution 1

Use Column method desc, as shown below:

val df = Seq(
  (2,6), (1,2), (1,3), (1,5), (2,3)
).toDF("A", "B")

df.orderBy($"A", $"B".desc).show
// +---+---+
// |  A|  B|
// +---+---+
// |  1|  5|
// |  1|  3|
// |  1|  2|
// |  2|  6|
// |  2|  3|
// +---+---+

Solution 2

desc is the correct method to use, however, not that it is a method in the Columnn class. It should therefore be applied as follows:

df.orderBy($"A", $"B".desc)

$"B".desc returns a column so "A" must also be changed to $"A" (or col("A") if spark implicits isn't imported).

Share:
18,895
kello
Author by

kello

Updated on June 12, 2022

Comments

  • kello
    kello almost 2 years

    Let's say, I have a table like this:

    A,B
    2,6
    1,2
    1,3
    1,5
    2,3
    

    I want to sort it with ascending order for column A but within that I want to sort it in descending order of column B, like this:

    A,B
    1,5
    1,3
    1,2
    2,6
    2,3
    

    I have tried to use orderBy("A", desc("B")) but it gives an error.

    How should I write the query using dataframe in Spark 2.0?

  • Luis Miguel Mejía Suárez
    Luis Miguel Mejía Suárez over 5 years
    I like to be as explicit as possible, so I would use the asc on the first column ($"A".asc), even if the default behavior is to sort ascending.
  • wayneeusa
    wayneeusa about 5 years
    df.orderBy($"A"desc, $"B".asc) solved my problem. Great.