How to specify schema while reading parquet file with pyspark?

12,033

Solution 1

This error usually occurs when you try to read an empty directory as parquet. If for example you create an empty DataFrame, you write it in parquet and then read it, this error appears. You could check if the DataFrame is empty with rdd.isEmpty() before write it.

Solution 2

I have done a quick implementation for the same

enter image description here

Hope this Helps!!...

Share:
12,033
y.selivonchyk
Author by

y.selivonchyk

Deep learning enthusiast.

Updated on June 05, 2022

Comments

  • y.selivonchyk
    y.selivonchyk almost 2 years

    While reading a parquet file stored in hadoop with either scala or pyspark an error occurs:

    #scala    
    var dff = spark.read.parquet("/super/important/df")
        org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
          at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:189)
          at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:189)
          at scala.Option.getOrElse(Option.scala:121)
          at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:188)
      at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
      at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
      at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)
      ... 52 elided
    

    or

    sql_context.read.parquet(output_file)
    

    results in the same error.

    Error message is pretty clear about what has to be done: Unable to infer schema for Parquet. It must be specified manually.;. But where can I specify it?

    Spark 2.1.1, Hadoop 2.5, dataframes are created with a help of pyspark. Files are partitioned into 10 peaces.

  • user9074332
    user9074332 over 4 years
    would consider an upvote if this was code and not a screenshot, but just cant...