PySpark: TypeError: StructType can not accept object 0.10000000000000001 in type <type 'numpy.float64'>
10,158
NumPy types, including numpy.float64
, are not a valid external representation for Spark SQL types. Furthermore schema you use doesn't reflect the shape of the data.
You should use standard Python types, and corresponding DataType
directly:
spark.createDataFrame(samples.tolist(), FloatType()).toDF("x")
Author by
Romeo Kienzler
Updated on June 06, 2022Comments
-
Romeo Kienzler over 1 year
when using PySpark with the following code:
from pyspark.sql.types import * samples = np.array([0.1,0.2]) dfSchema = StructType([StructField("x", FloatType(), True)]) spark.createDataFrame(samples,dfSchema)
I get:
TypeError: StructType can not accept object 0.10000000000000001 in type type 'numpy.float64'>
Any idea?
-
panc over 1 yearYeah. I had to first use
float(x)
to convert a value x in an RDD which is a numpy.float64 before callingtoDF