with pyspark.sql.functions unix_timestamp get null

12,779

You get NULL because format you use doesn't match the data. To get a minimal match you'll have to escape T with single quotes:

yyyy-MM-dd'T'kk:mm:ss

and to match the full pattern you'll need S for millisecond and X for timezone:

yyyy-MM-dd'T'kk:mm:ss.SSSXXX

but in the current Spark version direct cast:

from pyspark.sql.functions import col

col("dt").cast("timestamp")

should work just fine:

spark.sql(
    """SELECT CAST("2011-01-23T08:12:39.929+01:00" AS timestamp)"""
).show(1, False)
+------------------------------------------------+
|CAST(2011-01-23T08:12:39.929+01:00 AS TIMESTAMP)|
+------------------------------------------------+
|2011-01-23 08:12:39.929                         |
+------------------------------------------------+

Reference: SimpleDateFormat.

Share:
12,779
JuanPablo
Author by

JuanPablo

Updated on June 09, 2022

Comments

  • JuanPablo
    JuanPablo almost 2 years

    I try to convert a column from string to timestamp with this code

    from pyspark.sql.functions import unix_timestamp
    (sc
    .parallelize([Row(dt='2017-01-23T08:12:39.929+01:00')])
    .toDF()
    .withColumn("parsed", unix_timestamp("dt", "yyyy-MM-ddThh:mm:ss")
    .cast("double")
    .cast("timestamp"))
    .show(1, False))
    

    but I get null

    +-----------------------------+------+
    |dt                           |parsed|
    +-----------------------------+------+
    |2017-01-23T08:12:39.929+01:00|null  |
    +-----------------------------+------+
    

    why ?