AWS glue job to map string to date and time format while converting from csv to parquet

12,866

It uses presto datatypes so data should be in correct format

DATE Calendar date (year, month, day).

Example: DATE '2001-08-22'

TIME Time of day (hour, minute, second, millisecond) without a time zone. Values of this type are parsed and rendered in the session time zone.

Example: TIME '01:02:03.456'

TIMESTAMP Instant in time that includes the date and time of day without a time zone. Values of this type are parsed and rendered in the session time zone.

Example: TIMESTAMP '2001-08-22 03:04:05.321'

You may use:

from pyspark.sql.functions import to_timestamp, to_date, date_format

df = df.withColumn(col, to_timestamp(col, 'dd-MM-yyyy HH:mm'))
df = df.withColumn(col, to_date(col, 'dd-MM-yyyy'))
df = df.withColumn(col, date_format(col, 'HH:mm:ss'))
Share:
12,866
Kavya shree
Author by

Kavya shree

Updated on July 22, 2022

Comments

  • Kavya shree
    Kavya shree over 1 year

    while converting from csv to parquet, using AWS glue ETL job following mapped fields in csv read as string to date and time type. enter image description here this is the actual csv file enter image description here

    after mapping and converting, date filed is empty and time is concatenated with today's date enter image description here

    How to convert with proper date and time format?