conditional aggregation using pyspark

python apache-spark pyspark apache-spark-sql

14,832

You can translate your SQL code directly into DataFrame primitives:

from pyspark.sql.functions import when, sum, avg, col

(df
    .groupBy("a", "b", "c", "d")  # group by a,b,c,d
    .agg(  # select 
        when(col("c") < 10, sum("e"))  #  when c <=10 then sum(e)
            .when(col("c").between(10 ,20), avg("c"))  # when c between 10 and 20 then avg(e)
            .otherwise(0))   # else 0.00

14,832

Reddy

Updated on June 04, 2022

Comments

Reddy almost 2 years

consider the below as the dataframe

a        b  c   d   e  
africa  123 1   10  121.2
africa  123 1   10  321.98
africa  123 2   12  43.92
africa  124 2   12  43.92
usa     121 1   12  825.32
usa     121 1   12  89.78
usa     123 2   10  32.24
usa     123 5   21  43.92
canada  132 2   13  63.21
canada  132 2   13  89.23
canada  132 3   21  85.32
canada  131 3   10  43.92

now I want to convert the below case statement to equivalent statement in PYSPARK using dataframes.

we can directly use this in case statement using hivecontex/sqlcontest nut looking for the traditional pyspark nql query

select 
case 
    when c <=10 then sum(e)
    when c between 10 and 20 then avg(e)
else 0.00 end 
from table 
group by a,b,c,d

Regards Anvesh

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

How to use a Scala class inside Pyspark

How to use correlation in Spark with Dataframes?

PySpark: How to check if list of string values exists in dataframe and print values to a list

PySpark: TypeError: StructType can not accept object 0.10000000000000001 in type <type 'numpy.float64'>

How to filter a python Spark DataFrame by date between two date format columns

LEFT and RIGHT function in PySpark SQL

Create a dataframe from a list in pyspark.sql

PySpark explode list into multiple columns based on name

pyspark, Compare two rows in dataframe

PySpark - Compare DataFrames

conditional aggregation using pyspark

Related videos on Youtube

Reddy

Comments

Recents

Related