How to add Extra column with current date in Spark dataframe


Solution 1

use either lit or current_date

from pyspark.sql import functions as F

df2 = df.withColumn("Curr_date", F.lit("%Y-%m-%d")))

# OR

df2 = df.withColumn("Curr_date", F.current_date())

Solution 2

current_timestamp() is good but it is evaluated during the serialization time.

If you prefer to use the timestamp of the processing time of a row, then you may use the below method,

withColumn('current', expr("reflect('java.time.LocalDateTime', 'now')"))

Solution 3

There is a spark function current_timestamp().

from pyspark.sql.functions import *

df.withColumn('current', date_format(current_timestamp(), 'yyyy-MM-dd')).show()

|test|   current|


Related videos on Youtube

Author by


Updated on June 04, 2022


  • Varun
    Varun almost 2 years

    I am trying to add one column in my existing Pyspark Dataframe using withColumn method.I want to insert current date in this column.From my Source I don't have any date column so i am adding this current date column in my dataframe and saving this dataframe in my table so later for tracking purpose i can use this current date column. I am using below code


    here df is my existing Dataframe and i want to save df2 as table with Curr_date column. but here its expecting existing column or lit method instead of'%Y-%m-%d'). someone please guide me how should i add this Date column in my dataframe.?