SparkSQL - Difference between two time stamps in minutes

18,090

There are 2 ways to do it in Spark sql. You cast timestamp column to bigint and then subtract and divide by 60 are you can directly cast to unix_timestamp then subtract and divide by 60 to get result. I used pickup and dropoff column from dataframe above.(in pyspark/scala spark, bigint is long)

spark.sqlContext.sql("""select pickup, dropoff, (unix_timestamp(dropoff)-unix_timestamp(pickup))/(60) as diff from taxisub""").show()

OR

spark.sqlContext.sql("""select pickup, dropoff, ((bigint(to_timestamp(dropoff)))-(bigint(to_timestamp(pickup))))/(60) as diff from taxisub""").show()

Output:

+-------------------+-------------------+------------------+
|             pickup|            dropoff|              diff|
+-------------------+-------------------+------------------+
|2018-12-15 08:53:20|2018-12-15 08:57:57| 4.616666666666666|
|2018-12-15 08:03:08|2018-12-15 08:07:30| 4.366666666666666|
|2018-12-15 08:28:34|2018-12-15 08:33:31|              4.95|
|2018-12-15 08:37:53|2018-12-15 08:43:47|               5.9|
|2018-12-15 08:51:02|2018-12-15 08:55:54| 4.866666666666666|
|2018-12-15 08:03:47|2018-12-15 08:03:50|              0.05|
|2018-12-15 08:45:21|2018-12-15 08:57:08|11.783333333333333|
|2018-12-15 08:04:47|2018-12-15 08:29:05|              24.3|
|2018-12-15 08:01:22|2018-12-15 08:12:15|10.883333333333333|
+-------------------+-------------------+------------------+
Share:
18,090
peakstatus
Author by

peakstatus

Updated on June 05, 2022

Comments

  • peakstatus
    peakstatus almost 2 years

    I'm trying to convert the difference in minutes between two timestamps in the form MM/dd/yyyy hh:mm:ss AM/PM. I'm new to working with SparkSQL and tried using the basic datediff function that other SQL syntaxes support I.e datediff(minute,start_time,end_time), but that yielded the error:

    org.apache.spark.sql.AnalysisException: cannot resolve '`minute`' given input columns: [taxisub.tpep_dropoff_datetime, taxisub.DOLocationID, taxisub.improvement_surcharge, taxisub.VendorID, taxisub.trip_distance, taxisub.tip_amount, taxisub.tolls_amount, taxisub.payment_type, taxisub.fare_amount, taxisub.tpep_pickup_datetime, taxisub.total_amount, taxisub.store_and_fwd_flag, taxisub.extra, taxisub.passenger_count, taxisub.PULocationID, taxisub.mta_tax, taxisub.RatecodeID]; line 1 pos 153;
    

    It seems that the minute argument is not supported by datediff for sparkSQL. The query I currently have is:

    spark.sqlContext.sql("Select to_timestamp(tpep_pickup_datetime,'MM/dd/yyyy hh:mm:ss') as pickup,to_timestamp(tpep_dropoff_datetime,'MM/dd/yyyy hh:mm:ss') as dropoff, datediff(to_timestamp(tpep_pickup_datetime,'MM/dd/yyyy hh:mm:ss'),to_timestamp(tpep_dropoff_datetime,'MM/dd/yyyy hh:mm:ss')) as diff from taxisub ").show()
    

    And my results are:

    +-------------------+-------------------+----+
    |             pickup|            dropoff|diff|
    +-------------------+-------------------+----+
    |2018-12-15 08:53:20|2018-12-15 08:57:57|   0|
    |2018-12-15 08:03:08|2018-12-15 08:07:30|   0|
    |2018-12-15 08:28:34|2018-12-15 08:33:31|   0|
    |2018-12-15 08:37:53|2018-12-15 08:43:47|   0|
    |2018-12-15 08:51:02|2018-12-15 08:55:54|   0|
    |2018-12-15 08:03:47|2018-12-15 08:03:50|   0|
    |2018-12-15 08:45:21|2018-12-15 08:57:08|   0|
    |2018-12-15 08:04:47|2018-12-15 08:29:05|   0|
    |2018-12-15 08:01:22|2018-12-15 08:12:15|   0|
    +-------------------+-------------------+----+
    

    I'm assuming the default for datediff is difference in Days, given the 0 in the results. Is there an additional argument/function that I should be using to determine the difference in minutes between these two timestamps?

    Thanks in advance.

    • Shaido
      Shaido about 4 years
    • peakstatus
      peakstatus about 4 years
      @Shaido-ReinstateMonica That question is leveraging Spark Scala, I'm trying to solve the problem using Spark SQL.
    • Shaido
      Shaido about 4 years
      How is that different? Convert the values to long, subtract, then divide by 60.