if else in pyspark for collapsing column values
46,962
Solution 1
Try this :
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
def modify_values(r):
if r == "A" or r =="B":
return "dispatch"
else:
return "non-dispatch"
ol_val = udf(modify_values, StringType())
new_df = df.withColumn("wo_flag",ol_val(df.wo_flag))
Things you are doing wrong:
- You are trying to modify Rows (Rows are immmutable)
- When a map operation is done on a dataframe , the resulting data structure is a PipelinedRDD and not a dataframe . You have to apply .toDF() to get dataframe
Solution 2
The accepted answer is not very efficient due to the use of a user defined function (UDF).
I think most people are looking for when
.
from pyspark.sql.functions import when
matches = df["wo_flag"].isin("SLM", "NON-SLM")
new_df = df.withColumn("wo_flag", when(matches, "dispatch").otherwise("non-dispatch"))
Author by
Shweta Kamble
I try to be as geeky as others but I am not. An avid learner and always curious..:) obsessed with python and spark currently. Most of my experience is with text manipulation and mining.
Updated on December 09, 2020Comments
-
Shweta Kamble almost 3 years
I am trying a simple code to collapse my categorical variables in dataframe to binary classes after indexing currently my column has 3 classes- "A","B","C" I am writing a simple if else statement to collapse classes like
def condition(r): if (r.wo_flag=="SLM" or r.wo_flag=="NON-SLM"): r.wo_flag="dispatch" else: r.wo_flag="non_dispatch" return r.wo_flag df_final=df_new.map(lambda x: condition(x))
Its not working it doesn't understand the else condition
|MData|Recode12|Status|DayOfWeekOfDispatch|MannerOfDispatch|Wo_flag|PlaceOfInjury|Race| M| 11| M| 4| 7| C| 99| 1 | M| 8| D| 3| 7| A| 99| 1 | F| 10| W| 2| 7| C| 99| 1 | M| 9| D| 1| 7| B| 99| 1 | M| 8| D| 2| 7| C| 99| 1 |
This is the Sample Data