how to change column value in spark sql

sql apache-spark pyspark apache-spark-sql

16,064

You can use withColumn to overwrite the existing new_student_id column with the original new_student_id value if is not null, or otherwise the value from the student_id column is used:

from pyspark.sql.functions import col,when

#Create sample data
students = sc.parallelize([(123,'B',234),(555,'A',None)]).toDF(['student_id','grade','new_student_id'])

#Use withColumn to use student_id when new_student_id is not populated
cleaned = students.withColumn("new_student_id", 
          when(col("new_student_id").isNull(), col("student_id")).
          otherwise(col("new_student_id")))
cleaned.show()

Using your sample data as input:

+----------+-----+--------------+
|student_id|grade|new_student_id|
+----------+-----+--------------+
|       123|    B|           234|
|       555|    A|          null|
+----------+-----+--------------+

the output data looks as follows:

+----------+-----+--------------+
|student_id|grade|new_student_id|
+----------+-----+--------------+
|       123|    B|           234|
|       555|    A|           555|
+----------+-----+--------------+

16,064

rainyballball

Updated on May 28, 2022

Comments

rainyballball almost 2 years
In Sql, I can easily update some column value using UPDATE, for example: I have a table (student) like:
```
student_id, grade, new_student_id
123             B      234
555             A      null

UPDATE Student
SET student_id = new_student_id
WHERE new_student_id isNotNull
```
How can I do it in Spark using SparkSql(PySpark)?
- himanshuIIITian about 7 years
  
  Realted to - stackoverflow.com/questions/37517371/update-query-in-spark-s‌ql