contains pyspark SQL: TypeError: 'Column' object is not callable
41,234
Its because you are trying to apply the function contains
to the column. The function contains
does not exist in pyspark. You should try like
. Try this:
import pyspark.sql.functions as F
df = df.withColumn("AddCol",F.when(F.col("Pclass").like("3"),"three").otherwise("notthree"))
Or if you just want it to be exactly the number 3
you should do:
import pyspark.sql.functions as F
# If the column Pclass is numeric
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit(3),"three").otherwise("notthree"))
# If the column Pclass is string
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit("3"),"three").otherwise("notthree"))
Author by
Jeevan
Updated on July 05, 2021Comments
-
Jeevan over 2 years
I'm using spark 2.0.1,
df.show() +--------+------+---+-----+-----+----+ |Survived|Pclass|Sex|SibSp|Parch|Fare| +--------+------+---+-----+-----+----+ | 0.0| 3.0|1.0| 1.0| 0.0| 7.3| | 1.0| 1.0|0.0| 1.0| 0.0|71.3| | 1.0| 3.0|0.0| 0.0| 0.0| 7.9| | 1.0| 1.0|0.0| 1.0| 0.0|53.1| | 0.0| 3.0|1.0| 0.0| 0.0| 8.1| | 0.0| 3.0|1.0| 0.0| 0.0| 8.5| | 0.0| 1.0|1.0| 0.0| 0.0|51.9|
I have a data frame and I want to add a new column to df using withColumn and value of new column is base on other column value. I used something like this:
>>> dfnew = df.withColumn('AddCol' , when(df.Pclass.contains('3.0'),'three').otherwise('notthree'))
It is giving an error
TypeError: 'Column' object is not callable
can any help how to over come this error.
-
Wicho Valdeavellano over 4 yearsThe function
contains
is described in the documentation without any New in version X.X warning as seen here. Any idea why is not available? -
Manrique over 4 yearsIn this case, the user was using pyspark 2.0.1, in wich contains is not available. Check your pyspark version, because contains is only available from 2.2 and above. Cheers.