PySpark: Subtract Two Timestamp Columns and Give Back Difference in Minutes (Using F.datediff gives back only whole days)
17,816
Just convert the timestamps to unix timestamps (seconds since epoch), compute the difference, and divide by 60.
For example:
import pyspark.sql.functions as F
df.withColumn(
"date_diff_min",
(F.col("date_1").cast("long") - F.col("date_2").cast("long"))/60.
).show(truncate=False)
Author by
PineNuts0
Updated on June 05, 2022Comments
-
PineNuts0 almost 2 years
I have the following sample dataframe. The date_1 and date_2 columns have datatype of timestamp.
ID date_1 date_2 date_diff A 2019-01-09T01:25:00.000Z 2019-01-10T14:00:00.000Z -1 B 2019-01-12T02:18:00.000Z 2019-01-12T17:00:00.000Z 0
I want to find the different between date_1 and date_2 in minutes.
When I use the code below, it gives me the date_diff column in whole integer values (days):
df = df.withColumn("date_diff", F.datediff(F.col('date_1'), F.col('date_2')))
But what I want is for date_diff to take into consideration the timestamp and give me minutes back.
How do I do this?