Converting yyyymmdd to MM-dd-yyyy format in pyspark
15,929
from datetime import datetime
from pyspark.sql.functions import col,udf
from pyspark.sql.types import DateType
rdd = sc.parallelize(['20161231', '20140102', '20151201', '20161124'])
df1 = sqlContext.createDataFrame(rdd, ['old_col'])
# UDF to convert string to date
func = udf (lambda x: datetime.strptime(x, '%Y%m%d'), DateType())
df = df1.withColumn('new_col', date_format(func(col('old_col')), 'MM-dd-yyy'))
df.show()
Author by
Aman Burman
I am a 2016 batch B.Tech graduate in Comp. Sc. & Engg. currently employed at Discover Dollar Technologies Pvt. Ltd. as an associate developer. I am passionate about learning about current technologies and I am also a big military enthusiast.
Updated on June 07, 2022Comments
-
Aman Burman over 1 year
I have a large data frame df containing a column for date in the format
yyyymmdd
, how can I convert it intoMM-dd-yyyy
in pySpark. -
Aman Burman almost 7 yearsThanks mrsrinivas, I am using Zeppelin and this code is ran without error but I am not able to work(actions or transformations) on the result of your code. I am very new to Spark, please help me if I am missing something.
-
mrsrinivas almost 7 yearsuse
df.show()
for final output. you might have loaded data to dataframe(df1
) -
Aman Burman almost 7 yearsThanks mrsinivas, a little change was all I was missing, I will post my change also if someone else find it useful. Thanks again.
-
mrsrinivas almost 7 yearsDo you get output in this format
MM-dd-yyyy
. I think it give it inyyyy-MM-dd
. -
pegah almost 3 yearsMinor snippet correction: It misses
from pyspark.sql.functions import date_format