Converting yyyymmdd to MM-dd-yyyy format in pyspark

15,929
from datetime import datetime
from pyspark.sql.functions import col,udf
from pyspark.sql.types import DateType


rdd = sc.parallelize(['20161231', '20140102', '20151201', '20161124'])
df1 = sqlContext.createDataFrame(rdd, ['old_col'])

# UDF to convert string to date
func =  udf (lambda x: datetime.strptime(x, '%Y%m%d'), DateType())

df = df1.withColumn('new_col', date_format(func(col('old_col')), 'MM-dd-yyy'))

df.show()
Share:
15,929
Aman Burman
Author by

Aman Burman

I am a 2016 batch B.Tech graduate in Comp. Sc. & Engg. currently employed at Discover Dollar Technologies Pvt. Ltd. as an associate developer. I am passionate about learning about current technologies and I am also a big military enthusiast.

Updated on June 07, 2022

Comments

  • Aman Burman
    Aman Burman over 1 year

    I have a large data frame df containing a column for date in the format yyyymmdd, how can I convert it into MM-dd-yyyy in pySpark.

  • Aman Burman
    Aman Burman almost 7 years
    Thanks mrsrinivas, I am using Zeppelin and this code is ran without error but I am not able to work(actions or transformations) on the result of your code. I am very new to Spark, please help me if I am missing something.
  • mrsrinivas
    mrsrinivas almost 7 years
    use df.show() for final output. you might have loaded data to dataframe(df1)
  • Aman Burman
    Aman Burman almost 7 years
    Thanks mrsinivas, a little change was all I was missing, I will post my change also if someone else find it useful. Thanks again.
  • mrsrinivas
    mrsrinivas almost 7 years
    Do you get output in this format MM-dd-yyyy. I think it give it in yyyy-MM-dd.
  • pegah
    pegah almost 3 years
    Minor snippet correction: It misses from pyspark.sql.functions import date_format