Add leading zeros to Columns in a Spark Data Frame

12,155

Solution 1

You can simply do that by using concat inbuilt function

df.withColumn("iD", concat(lit("00"), col("iD")))
           .withColumn("val", concat(lit("0"), col("val")))

Solution 2

This solved it for me, thank you all for the help

 val df2 = df
        .withColumn("idLong", format_string("%03d", $"iD"))
Share:
12,155
fletchr
Author by

fletchr

Data Scientist

Updated on June 15, 2022

Comments

  • fletchr
    fletchr almost 2 years

    In short, I'm leveraging spark-xml to do some parsing of XML files. However, using this is removing the leading zeros in all the values I'm interested in. However, I need the final output, which is a DataFrame, to include the leading zeros. I'm unsure/can not figure out a way to add leading zeros to the columns I'm interested in.

    val df = spark.read
      .format("com.databricks.spark.xml")
      .option("rowTag", "output")
      .option("excludeAttribute", true)
      .option("allowNumericLeadingZeros", true) //including this does not solve the problem
      .load("pathToXmlFile")
    

    Example output that I'm getting

    +------+---+--------------------+
    |iD    |val|Code                |
    +------+---+--------------------+
    |1     |44 |9022070536692784476 |
    |2     |66 |-5138930048185086175|
    |3     |25 |805582856291361761  |
    |4     |17 |-9107885086776983000|
    |5     |18 |1993794295881733178 |
    |6     |31 |-2867434050463300064|
    |7     |88 |-4692317993930338046|
    |8     |44 |-4039776869915039812|
    |9     |20 |-5786627276152563542|
    |10    |12 |7614363703260494022 |
    +------+---+--------------------+
    

    Desired output

    +--------+----+--------------------+
    |iD      |val |Code                |
    +--------+----+--------------------+
    |001     |044 |9022070536692784476 |
    |002     |066 |-5138930048185086175|
    |003     |025 |805582856291361761  |
    |004     |017 |-9107885086776983000|
    |005     |018 |1993794295881733178 |
    |006     |031 |-2867434050463300064|
    |007     |088 |-4692317993930338046|
    |008     |044 |-4039776869915039812|
    |009     |020 |-5786627276152563542|
    |0010    |012 |7614363703260494022 |
    +--------+----+--------------------+
    
  • fletchr
    fletchr almost 6 years
    thanks, that worked. I also tried another way as well, I posted it
  • Ramesh Maharjan
    Ramesh Maharjan almost 6 years
    that was really good :) thanks for accepting
  • Bhaskar
    Bhaskar about 4 years
    Perfect! Thanks. For reference for those who visit after me here is the documentation link for format_string