Spark SQL Row_number() PartitionBy Sort Desc

102,649

Solution 1

desc should be applied on a column not a window definition. You can use either a method on a column:

from pyspark.sql.functions import col, row_number
from pyspark.sql.window import Window

F.row_number().over(
    Window.partitionBy("driver").orderBy(col("unit_count").desc())
)

or a standalone function:

from pyspark.sql.functions import desc
from pyspark.sql.window import Window

F.row_number().over(
    Window.partitionBy("driver").orderBy(desc("unit_count"))
)

Solution 2

Or you can use the SQL code in Spark-SQL:

from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .master('local[*]')\
    .appName('Test')\
    .getOrCreate()

spark.sql("""
    select driver
        ,also_item
        ,unit_count
        ,ROW_NUMBER() OVER (PARTITION BY driver ORDER BY unit_count DESC) AS rowNum
    from data_cooccur
""").show()
Share:
102,649

Related videos on Youtube

jKraut
Author by

jKraut

Updated on July 16, 2022

Comments

  • jKraut
    jKraut almost 2 years

    I've successfully create a row_number() partitionBy by in Spark using Window, but would like to sort this by descending, instead of the default ascending. Here is my working code:

    from pyspark import HiveContext
    from pyspark.sql.types import *
    from pyspark.sql import Row, functions as F
    from pyspark.sql.window import Window
    
    data_cooccur.select("driver", "also_item", "unit_count", 
        F.rowNumber().over(Window.partitionBy("driver").orderBy("unit_count")).alias("rowNum")).show()
    

    That gives me this result:

     +------+---------+----------+------+
     |driver|also_item|unit_count|rowNum|
     +------+---------+----------+------+
     |   s10|      s11|         1|     1|
     |   s10|      s13|         1|     2|
     |   s10|      s17|         1|     3|
    

    And here I add the desc() to order descending:

    data_cooccur.select("driver", "also_item", "unit_count", F.rowNumber().over(Window.partitionBy("driver").orderBy("unit_count").desc()).alias("rowNum")).show()
    

    And get this error:

    AttributeError: 'WindowSpec' object has no attribute 'desc'

    What am I doing wrong here?

    • lambruscoAcido
      lambruscoAcido about 6 years
      On my PySpark (2.2.0) I have to use row_number instead of rowNumber.
  • user3486773
    user3486773 over 4 years
    I preferred this method. Worked for me.
  • Umar.H
    Umar.H over 3 years
    strange the pyspark orderBy is different from the window.orderBy as one accepts an ascending and the other doesn't.