How do I increase decimal precision in Spark?

19,118

I think the error is pretty self explanatory- you need to be using a DecimalType not a DoubleType.

Try this:

...
.cast(DecimalType(6)))

Read on:

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/types/DecimalType.html

http://spark.apache.org/docs/2.0.2/api/python/_modules/pyspark/sql/types.html

datatype for handling big numbers in pyspark

Share:
19,118

Related videos on Youtube

Ross Lewis
Author by

Ross Lewis

At IBM, I'm focused on big data and machine learning. I demo Spark to customers who are interested in learning more about it. For fun I drum, spend time with friends, and try new things in the city. I'm currently considering going to graduate school.

Updated on September 15, 2022

Comments

  • Ross Lewis
    Ross Lewis over 1 year

    I have a large DataFrame made up of ~550 columns of doubles and two columns of longs (ids). The 550 columns are being read in from a csv, and I add two id columns. The only other things I do with the data is change some of the csv data from strings to doubles ("Inf" -> "0" then cast the column to double) and replace NaN's with 0:

    df = df.withColumn(col.name + "temp", 
                                 regexp_replace(
                                     regexp_replace(df(col.name),"Inf","0")
                                     ,"NaN","0").cast(DoubleType))
    df = df.drop(col.name).withColumnRenamed(col.name + "temp",col.name)
    df = df.withColumn("timeId", monotonically_increasing_id.cast(LongType))
    df = df.withColumn("patId", lit(num).cast(LongType))
    df = df.na.fill(0)
    

    When I do a count, I get the following error:

    IllegalArgumentException: requirement failed: Decimal precision 6 exceeds max precision 5
    

    There are hundreds of thousands of rows, and I'm reading in the data from multiple csvs. How do I increase the decimal precision? Is there something else that could be going on? I am only getting this error when I read in some of the csvs. Could they have more decimals than the others?

  • Ross Lewis
    Ross Lewis almost 7 years