Timestamp parsing in pyspark

19,338

You can parse this timestamp using unix_timestamp:

from pyspark.sql import functions as F

format = "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
df2 = df1.withColumn('Timestamp2', F.unix_timestamp('Timestamp', format).cast('timestamp'))

Then, you can use dayofmonth in the new Timestamp column:

df2.select(F.dayofmonth('Timestamp2'))

More detials about these functions can be found in the pyspark functions documentation.

Share:
19,338
data_person
Author by

data_person

Updated on June 11, 2022

Comments

  • data_person
    data_person almost 2 years
    df1:
    
    Timestamp:
    
    1995-08-01T00:00:01.000+0000
    

    Is there a way to separate the day of the month in the timestamp column of the data frame using pyspark. Not able to provide the code, I am new to spark. I do not have a clue on how to proceed.