Spark assign value if null to column (python)
13,302
You can use https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#pyspark.sql.functions.coalesce
df.withColumn('values2', coalesce(df.values2, df.values)).show()
Author by
Mpizos Dimitris
Updated on June 26, 2022Comments
-
Mpizos Dimitris almost 2 years
Assuming that I have the following data
+--------------------+-----+--------------------+ | values|count| values2| +--------------------+-----+--------------------+ | aaaaaa| 249| null| | bbbbbb| 166| b2| | cccccc| 1680| something| +--------------------+-----+--------------------+
So if there is a null value in
values2
column how to assign thevalues1
column to it? So the result should be:+--------------------+-----+--------------------+ | values|count| values2| +--------------------+-----+--------------------+ | aaaaaa| 249| aaaaaa| | bbbbbb| 166| b2| | cccccc| 1680| something| +--------------------+-----+--------------------+
I thought of something of the following but it doesnt work:
df.na.fill({"values2":df['values']}).show()
I found this way to solve it but there should be something more clear forward:
def change_null_values(a,b): if b: return b else: return a udf_change_null = udf(change_null_values,StringType()) df.withColumn("values2",udf_change_null("values","values2")).show()
-
proutray almost 4 yearsDoesn't provide complete solution to the issue. Only hints towards the solution.
-
vegetarianCoder about 2 yearsCan we ingest our own value like 0 or something ?