with pyspark.sql.functions unix_timestamp get null
12,779
You get NULL
because format you use doesn't match the data. To get a minimal match you'll have to escape T
with single quotes:
yyyy-MM-dd'T'kk:mm:ss
and to match the full pattern you'll need S
for millisecond and X
for timezone:
yyyy-MM-dd'T'kk:mm:ss.SSSXXX
but in the current Spark version direct cast
:
from pyspark.sql.functions import col
col("dt").cast("timestamp")
should work just fine:
spark.sql(
"""SELECT CAST("2011-01-23T08:12:39.929+01:00" AS timestamp)"""
).show(1, False)
+------------------------------------------------+
|CAST(2011-01-23T08:12:39.929+01:00 AS TIMESTAMP)|
+------------------------------------------------+
|2011-01-23 08:12:39.929 |
+------------------------------------------------+
Reference: SimpleDateFormat
.
Author by
JuanPablo
Updated on June 09, 2022Comments
-
JuanPablo almost 2 years
I try to convert a column from string to timestamp with this code
from pyspark.sql.functions import unix_timestamp (sc .parallelize([Row(dt='2017-01-23T08:12:39.929+01:00')]) .toDF() .withColumn("parsed", unix_timestamp("dt", "yyyy-MM-ddThh:mm:ss") .cast("double") .cast("timestamp")) .show(1, False))
but I get null
+-----------------------------+------+ |dt |parsed| +-----------------------------+------+ |2017-01-23T08:12:39.929+01:00|null | +-----------------------------+------+
why ?