contains pyspark SQL: TypeError: 'Column' object is not callable

41,234

Its because you are trying to apply the function contains to the column. The function contains does not exist in pyspark. You should try like. Try this:

import pyspark.sql.functions as F

df = df.withColumn("AddCol",F.when(F.col("Pclass").like("3"),"three").otherwise("notthree"))

Or if you just want it to be exactly the number 3 you should do:

import pyspark.sql.functions as F

# If the column Pclass is numeric
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit(3),"three").otherwise("notthree"))

# If the column Pclass is string
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit("3"),"three").otherwise("notthree"))
Share:
41,234
Jeevan
Author by

Jeevan

Updated on July 05, 2021

Comments

  • Jeevan
    Jeevan over 2 years

    I'm using spark 2.0.1,

     df.show()
    +--------+------+---+-----+-----+----+
    |Survived|Pclass|Sex|SibSp|Parch|Fare|
    +--------+------+---+-----+-----+----+
    |     0.0|   3.0|1.0|  1.0|  0.0| 7.3|
    |     1.0|   1.0|0.0|  1.0|  0.0|71.3|
    |     1.0|   3.0|0.0|  0.0|  0.0| 7.9|
    |     1.0|   1.0|0.0|  1.0|  0.0|53.1|
    |     0.0|   3.0|1.0|  0.0|  0.0| 8.1|
    |     0.0|   3.0|1.0|  0.0|  0.0| 8.5|
    |     0.0|   1.0|1.0|  0.0|  0.0|51.9|
    

    I have a data frame and I want to add a new column to df using withColumn and value of new column is base on other column value. I used something like this:

    >>> dfnew = df.withColumn('AddCol' , when(df.Pclass.contains('3.0'),'three').otherwise('notthree'))
    

    It is giving an error

    TypeError: 'Column' object is not callable
    

    can any help how to over come this error.

  • Wicho Valdeavellano
    Wicho Valdeavellano over 4 years
    The function contains is described in the documentation without any New in version X.X warning as seen here. Any idea why is not available?
  • Manrique
    Manrique over 4 years
    In this case, the user was using pyspark 2.0.1, in wich contains is not available. Check your pyspark version, because contains is only available from 2.2 and above. Cheers.