How to add Extra column with current date in Spark dataframe

12,349

Solution 1

use either lit or current_date

from pyspark.sql import functions as F

df2 = df.withColumn("Curr_date", F.lit(datetime.now().strftime("%Y-%m-%d")))

# OR

df2 = df.withColumn("Curr_date", F.current_date())

Solution 2

current_timestamp() is good but it is evaluated during the serialization time.

If you prefer to use the timestamp of the processing time of a row, then you may use the below method,

withColumn('current', expr("reflect('java.time.LocalDateTime', 'now')"))

Solution 3

There is a spark function current_timestamp().

from pyspark.sql.functions import *

df.withColumn('current', date_format(current_timestamp(), 'yyyy-MM-dd')).show()

+----+----------+
|test|   current|
+----+----------+
|test|2020-09-09|
+----+----------+

Share:
12,349

Related videos on Youtube

Varun
Author by

Varun

Updated on June 04, 2022

Comments

  • Varun
    Varun almost 2 years

    I am trying to add one column in my existing Pyspark Dataframe using withColumn method.I want to insert current date in this column.From my Source I don't have any date column so i am adding this current date column in my dataframe and saving this dataframe in my table so later for tracking purpose i can use this current date column. I am using below code

        df2=df.withColumn("Curr_date",datetime.now().strftime('%Y-%m-%d'))
    

    here df is my existing Dataframe and i want to save df2 as table with Curr_date column. but here its expecting existing column or lit method instead of datetime.now().strftime('%Y-%m-%d'). someone please guide me how should i add this Date column in my dataframe.?