if else in pyspark for collapsing column values

46,962

Solution 1

Try this :

from pyspark.sql.types import StringType
from pyspark.sql.functions import udf

def modify_values(r):
    if r == "A" or r =="B":
        return "dispatch"
    else:
        return "non-dispatch"
ol_val = udf(modify_values, StringType())
new_df = df.withColumn("wo_flag",ol_val(df.wo_flag))

Things you are doing wrong:

  • You are trying to modify Rows (Rows are immmutable)
  • When a map operation is done on a dataframe , the resulting data structure is a PipelinedRDD and not a dataframe . You have to apply .toDF() to get dataframe

Solution 2

The accepted answer is not very efficient due to the use of a user defined function (UDF).

I think most people are looking for when.

from pyspark.sql.functions import when

matches = df["wo_flag"].isin("SLM", "NON-SLM")
new_df = df.withColumn("wo_flag", when(matches, "dispatch").otherwise("non-dispatch"))
Share:
46,962
Shweta Kamble
Author by

Shweta Kamble

I try to be as geeky as others but I am not. An avid learner and always curious..:) obsessed with python and spark currently. Most of my experience is with text manipulation and mining.

Updated on December 09, 2020

Comments

  • Shweta Kamble
    Shweta Kamble almost 3 years

    I am trying a simple code to collapse my categorical variables in dataframe to binary classes after indexing currently my column has 3 classes- "A","B","C" I am writing a simple if else statement to collapse classes like

    def condition(r):
    if (r.wo_flag=="SLM" or r.wo_flag=="NON-SLM"):
        r.wo_flag="dispatch" 
    else: 
        r.wo_flag="non_dispatch" 
    return r.wo_flag 
    
    df_final=df_new.map(lambda x: condition(x)) 
    

    Its not working it doesn't understand the else condition

    |MData|Recode12|Status|DayOfWeekOfDispatch|MannerOfDispatch|Wo_flag|PlaceOfInjury|Race|
         M|      11|     M|                  4|               7|      C|           99| 1  |    
         M|       8|     D|                  3|               7|      A|           99| 1  |
         F|      10|     W|                  2|               7|      C|           99| 1  |
         M|       9|     D|                  1|               7|      B|           99| 1  |
         M|       8|     D|                  2|               7|      C|           99| 1  |
    

    This is the Sample Data