Table or view not found with registerTempTable
16,153
First you will want to initialize spark with Hive support, for example:
spark = SparkSession.builder \
.master("yarn") \
.appName("AspectDetector") \
.enableHiveSupport() \
.getOrCreate()
sqlContext = SQLContext(spark)
But instead of using sqlContext.sql()
, you will want to use spark.sql()
to run your query.
I found this confusing as well but I think it is because when you do the data.registerTempTable("temp")
you are actually in the spark context instead of the sqlContext context. If you want to query a hive table, you should still use sqlContext.sql()
.
Related videos on Youtube
Author by
kassnl
Updated on July 12, 2022Comments
-
kassnl almost 2 years
So I run the following on pyspark shell:
>>> data = spark.read.csv("annotations_000", header=False, mode="DROPMALFORMED", schema=schema) >>> data.show(3) +----------+--------------------+--------------------+---------+---------+--------+-----------------+ | item_id| review_id| text| aspect|sentiment|comments| annotation_round| +----------+--------------------+--------------------+---------+---------+--------+-----------------+ |9999900031|9999900031/custom...|Just came back to...|breakfast| 3| null|ASE_OpeNER_round2| |9999900031|9999900031/custom...|Just came back to...| staff| 3| null|ASE_OpeNER_round2| |9999900031|9999900031/custom...|The hotel was loc...| noise| 2| null|ASE_OpeNER_round2| +----------+--------------------+--------------------+---------+---------+--------+-----------------+ >>> data.registerTempTable("temp") >>> df = sqlContext.sql("select first(item_id), review_id, first(text), concat_ws(';', collect_list(aspect)) as aspect from temp group by review_id") >>> df.show(3) +---------------------+--------------------+--------------------+--------------------+ |first(item_id, false)| review_id| first(text, false)| aspect| +---------------------+--------------------+--------------------+--------------------+ | 100012|100012/tripadviso...|We stayed here la...| staff;room| | 100013|100013/tripadviso...|We stayed for two...| breakfast| | 100031|100031/tripadviso...|We stayed two nig...|noise;breakfast;room| +---------------------+--------------------+--------------------+--------------------+
and it works perfectly with the shell sqlContext variable.
When I write it as a script:
from pyspark import SparkContext from pyspark.sql import SparkSession, SQLContext sc = SparkContext(appName="AspectDetector") spark = SparkSession(sc) sqlContext = SQLContext(sc) data.registerTempTable("temp") df = sqlContext.sql("select first(item_id), review_id, first(text), concat_ws(';', collect_list(aspect)) as aspect from temp group by review_id")
and run it I get the following:
pyspark.sql.utils.AnalysisException: u'Table or view not found: temp; line 1 pos 99'
How is that possible? Am I doing something wrong on the instatiation of sqlContext?
-
anshul_cached over 7 yearsInitialize it with Hive Context
-