How to do mathematical operation with two column in dataframe using pyspark

13,919
df = sqlContext.createDataFrame([('bn', 12452, 221), ('mb', 14521, 330)], ['x', 'y', 'z'])
df = df.withColumn('m', df['z'] / (df['y'] + df['z']))
df.head(2)
Share:
13,919
Mukesh Jha
Author by

Mukesh Jha

Updated on June 19, 2022

Comments

  • Mukesh Jha
    Mukesh Jha almost 2 years

    I have dataframe with three column "x" ,"y" and "z"

    x        y         z
    bn      12452     221
    mb      14521     330
    pl      12563     160
    lo      22516     142
    

    I need to create a another column which is derived by this formula

    (m = z / y+z)
    

    So the new data frameshould look something like this:

    x        y         z        m
    bn      12452     221      .01743
    mb      14521     330      .02222
    pl      12563     160      .01257
    lo      22516     142      .00626