pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'
It turns out I was getting this error because there was another level to the directory structure. The following was what I needed;
data = spark.read.parquet('/myhdfs/location/anotherlevel/')
Related videos on Youtube
Taylrl
MSc in Astrophysics Research Topic: Cosmological Simulations of The Large Scale Structure of The Universe BY DAY: A Data Scientist working in Alternative Data BY NIGHT: A young dad with a love for life!
Updated on June 04, 2022Comments
-
Taylrl almost 2 years
This has a different answer to those given in the post above
I am getting an error that reads
pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'
when I try to read in a parquet file like such using Spark 2.1.0
data = spark.read.parquet('/myhdfs/location/')
I have checked and the file/table is not empty by looking at the impala table through the Hue WebPortal. Also, other files that I have stored in similar directories read absolutely fine. For the record, the file names contain hyphens but no underscores or full-stops/periods.
Hence, none of the answers in the following post apply Unable to infer schema when loading Parquet file
Any ideas?
-
ash_huddles over 5 yearsHave you checked the answers on this post first: stackoverflow.com/questions/44954892/…
-
10465355 over 5 yearsPossible duplicate of Unable to infer schema when loading Parquet file
-
Taylrl over 5 yearsYeap. I’ve read that and none of the answers apply.
-
Sim over 5 yearsTry reading an individual Parquet file by providing its full path and report the outcome.
-
Taylrl over 5 yearsAh hah! It turns out there was another level in the directory structure!
-
-
tjheslin1 about 2 yearsThis does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. To get notified when this question gets new answers, you can follow this question. Once you have enough reputation, you can also add a bounty to draw more attention to this question. - From Review