PySpark: Subtract Two Timestamp Columns and Give Back Difference in Minutes (Using F.datediff gives back only whole days)

17,816

Just convert the timestamps to unix timestamps (seconds since epoch), compute the difference, and divide by 60.

For example:

import pyspark.sql.functions as F
df.withColumn(
    "date_diff_min", 
    (F.col("date_1").cast("long") - F.col("date_2").cast("long"))/60.
).show(truncate=False)
Share:
17,816
PineNuts0
Author by

PineNuts0

Updated on June 05, 2022

Comments

  • PineNuts0
    PineNuts0 almost 2 years

    I have the following sample dataframe. The date_1 and date_2 columns have datatype of timestamp.

    ID  date_1                      date_2                      date_diff
    A   2019-01-09T01:25:00.000Z    2019-01-10T14:00:00.000Z    -1
    B   2019-01-12T02:18:00.000Z    2019-01-12T17:00:00.000Z    0
    

    I want to find the different between date_1 and date_2 in minutes.

    When I use the code below, it gives me the date_diff column in whole integer values (days):

    df = df.withColumn("date_diff", F.datediff(F.col('date_1'), F.col('date_2')))  
    

    But what I want is for date_diff to take into consideration the timestamp and give me minutes back.

    How do I do this?