how to change column value in spark sql
16,064
You can use withColumn
to overwrite the existing new_student_id
column with the original new_student_id
value if is not null, or otherwise the value from the student_id
column is used:
from pyspark.sql.functions import col,when
#Create sample data
students = sc.parallelize([(123,'B',234),(555,'A',None)]).toDF(['student_id','grade','new_student_id'])
#Use withColumn to use student_id when new_student_id is not populated
cleaned = students.withColumn("new_student_id",
when(col("new_student_id").isNull(), col("student_id")).
otherwise(col("new_student_id")))
cleaned.show()
Using your sample data as input:
+----------+-----+--------------+
|student_id|grade|new_student_id|
+----------+-----+--------------+
| 123| B| 234|
| 555| A| null|
+----------+-----+--------------+
the output data looks as follows:
+----------+-----+--------------+
|student_id|grade|new_student_id|
+----------+-----+--------------+
| 123| B| 234|
| 555| A| 555|
+----------+-----+--------------+
Related videos on Youtube
Author by
rainyballball
Updated on May 28, 2022Comments
-
rainyballball almost 2 years
In Sql, I can easily update some column value using UPDATE, for example: I have a table (student) like:
student_id, grade, new_student_id 123 B 234 555 A null UPDATE Student SET student_id = new_student_id WHERE new_student_id isNotNull
How can I do it in Spark using SparkSql(PySpark)?
-
himanshuIIITian about 7 years
-