How to turn off scientific notation in pyspark?

19,652

The easiest way is to cast double column to decimal, giving appropriate precision and scale:

df.withColumn('total_sale_volume', df.total_sale_volume.cast(DecimalType(18, 2)))
Share:
19,652
chessosapiens
Author by

chessosapiens

Updated on June 16, 2022

Comments

  • chessosapiens
    chessosapiens almost 2 years

    As the result of some aggregation i come up with following sparkdataframe:

     ------------+-----------------+-----------------+
    |sale_user_id|gross_profit     |total_sale_volume|
    +------------+-----------------+-----------------+
    |       20569|       -3322960.0|     2.12569482E8|
    |       24269|       -1876253.0|      8.6424626E7|
    |        9583|              0.0|       1.282272E7|
    |       11722|          18229.0|        5653149.0|
    |       37982|           6077.0|        1181243.0|
    |       20428|           1665.0|        7011588.0|
    |       41157|          73227.0|        1.18631E7|
    |        9993|              0.0|        1481437.0|
    |        9030|           8865.0|      4.4133791E7|
    |         829|              0.0|          11355.0|
    +------------+-----------------+-----------------+
    

    and the schema of the dataframe is :

    root
     |-- sale_user_id: string (nullable = true)
     |-- tapp_gross_profit: double (nullable = true)
     |-- total_sale_volume: double (nullable = true)
    

    how can i disable scientific notation in each of gross_profit and total_sale_volume columns?

  • Bruno Ambrozio
    Bruno Ambrozio over 3 years
    Any idea on how to do that without informing the number of decimal places (exponents)? I mean, make it to be inferred?
  • Mariusz
    Mariusz over 3 years
    @BrunoAmbrozio You can always .collect() a dataframe, and then you have a pure python objects with more control on how these are printed (stackoverflow.com/questions/658763/…)
  • Bruno Ambrozio
    Bruno Ambrozio over 3 years
    Right now I need pretty much the same but for persisting the values in a file, however, I cannot set the precision. Appreciate if someone has a solution. Here's the new question: stackoverflow.com/questions/64772851/…
  • sabacherli
    sabacherli over 2 years
    DecimalType is also subject to scientific notation, depending on the precision and scale.
  • Tim Gautier
    Tim Gautier about 2 years
    DecimalType isn't deprecated in spark 3.0+
  • samkart
    samkart almost 2 years
    see this for DecimalType in spark 3.0+