Add column to pyspark dataframe based on a condition

18,416

Try something like this:

from pyspark.sql import functions as f
df.withColumn('D', f.when(f.col('B') > 0, "Yes").otherwise("No")).show()
Share:
18,416
jason_1093
Author by

jason_1093

Updated on June 08, 2022

Comments

  • jason_1093
    jason_1093 almost 2 years

    My data.csv file has three columns like given below. I have converted this file to python spark dataframe.

      A   B    C
    | 1 | -3 | 4 |
    | 2 | 0  | 5 |
    | 6 | 6  | 6 |
    

    I want to add another column D in spark dataframe with values as Yes or No based on the condition that if corresponding value in B column is greater than 0 then yes otherwise No.

      A   B    C   D
    | 1 | -3 | 4 | No  |
    | 2 | 0  | 5 | No  |
    | 6 | 6  | 6 | Yes |
    

    I am not able to implement this through PySpark dataframe operations.