PySpark: TypeError: StructType can not accept object 0.10000000000000001 in type <type 'numpy.float64'>

10,158

NumPy types, including numpy.float64, are not a valid external representation for Spark SQL types. Furthermore schema you use doesn't reflect the shape of the data.

You should use standard Python types, and corresponding DataType directly:

spark.createDataFrame(samples.tolist(), FloatType()).toDF("x")
Share:
10,158
Romeo Kienzler
Author by

Romeo Kienzler

Updated on June 06, 2022

Comments

  • Romeo Kienzler
    Romeo Kienzler over 1 year

    when using PySpark with the following code:

    from pyspark.sql.types import *
    samples = np.array([0.1,0.2])
    dfSchema = StructType([StructField("x", FloatType(), True)])
    spark.createDataFrame(samples,dfSchema)
    

    I get:

    TypeError: StructType can not accept object 0.10000000000000001 in type type 'numpy.float64'>

    Any idea?

  • panc
    panc over 1 year
    Yeah. I had to first use float(x) to convert a value x in an RDD which is a numpy.float64 before calling toDF