Add column to pyspark dataframe based on a condition
18,416
Try something like this:
from pyspark.sql import functions as f
df.withColumn('D', f.when(f.col('B') > 0, "Yes").otherwise("No")).show()
Author by
jason_1093
Updated on June 08, 2022Comments
-
jason_1093 almost 2 years
My data.csv file has three columns like given below. I have converted this file to python spark dataframe.
A B C | 1 | -3 | 4 | | 2 | 0 | 5 | | 6 | 6 | 6 |
I want to add another column D in spark dataframe with values as Yes or No based on the condition that if corresponding value in B column is greater than 0 then yes otherwise No.
A B C D | 1 | -3 | 4 | No | | 2 | 0 | 5 | No | | 6 | 6 | 6 | Yes |
I am not able to implement this through PySpark dataframe operations.