spark Dataframe execute UPDATE statement
I don't think its supported out of the box yet by Spark. What you can do it iterate over the dataframe/RDD using the foreachRDD() loop and manually update/delete the table using JDBC api.
here is link to a similar question : Spark Dataframes UPSERT to Postgres Table
Giorgio
Updated on June 07, 2022Comments
-
Giorgio almost 2 years
Hy guys,
I need to perform jdbc operation using Apache Spark DataFrame. Basically I have an historical jdbc table called Measures where I have to do two operations:
1. Set endTime validity attribute of the old measure record to the current time
2. Insert a new measure record setting endTime to 9999-12-31
Can someone tell me how to perform (if we can) update statement for the first operation and insert for the second operation?
I tried to use this statement for the first operation:
val dfWriter = df.write.mode(SaveMode.Overwrite) dfWriter.jdbc("jdbc:postgresql:postgres", tableName, prop)
But it doesn't work because there is a duplicate key violation. If we can do update, how we can do delete statement?
Thanks in advance.