How do I increase decimal precision in Spark?

python scala apache-spark spark-dataframe bigdata

19,118

I think the error is pretty self explanatory- you need to be using a DecimalType not a DoubleType.

Try this:

...
.cast(DecimalType(6)))

Read on:

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/types/DecimalType.html

http://spark.apache.org/docs/2.0.2/api/python/_modules/pyspark/sql/types.html

datatype for handling big numbers in pyspark

19,118

Ross Lewis

At IBM, I'm focused on big data and machine learning. I demo Spark to customers who are interested in learning more about it. For fun I drum, spend time with friends, and try new things in the city. I'm currently considering going to graduate school.

Updated on September 15, 2022

Comments

Ross Lewis over 1 year
I have a large DataFrame made up of ~550 columns of doubles and two columns of longs (ids). The 550 columns are being read in from a csv, and I add two id columns. The only other things I do with the data is change some of the csv data from strings to doubles ("Inf" -> "0" then cast the column to double) and replace NaN's with 0:
```
df = df.withColumn(col.name + "temp", 
                             regexp_replace(
                                 regexp_replace(df(col.name),"Inf","0")
                                 ,"NaN","0").cast(DoubleType))
df = df.drop(col.name).withColumnRenamed(col.name + "temp",col.name)
df = df.withColumn("timeId", monotonically_increasing_id.cast(LongType))
df = df.withColumn("patId", lit(num).cast(LongType))
df = df.na.fill(0)
```
When I do a count, I get the following error:
```
IllegalArgumentException: requirement failed: Decimal precision 6 exceeds max precision 5
```
There are hundreds of thousands of rows, and I'm reading in the data from multiple csvs. How do I increase the decimal precision? Is there something else that could be going on? I am only getting this error when I read in some of the csvs. Could they have more decimals than the others?
Ross Lewis almost 7 years

Thanks! Here is a follow up if you're interested