Table or view not found with registerTempTable

16,153

First you will want to initialize spark with Hive support, for example:

spark = SparkSession.builder \
    .master("yarn") \
    .appName("AspectDetector") \
    .enableHiveSupport() \
    .getOrCreate()

sqlContext = SQLContext(spark)

But instead of using sqlContext.sql(), you will want to use spark.sql() to run your query.

I found this confusing as well but I think it is because when you do the data.registerTempTable("temp") you are actually in the spark context instead of the sqlContext context. If you want to query a hive table, you should still use sqlContext.sql().

Share:
16,153

Related videos on Youtube

kassnl
Author by

kassnl

Updated on July 12, 2022

Comments

  • kassnl
    kassnl almost 2 years

    So I run the following on pyspark shell:

    >>> data = spark.read.csv("annotations_000", header=False, mode="DROPMALFORMED", schema=schema)
    >>> data.show(3)
    +----------+--------------------+--------------------+---------+---------+--------+-----------------+
    |   item_id|           review_id|                text|   aspect|sentiment|comments| annotation_round|
    +----------+--------------------+--------------------+---------+---------+--------+-----------------+
    |9999900031|9999900031/custom...|Just came back to...|breakfast|        3|    null|ASE_OpeNER_round2|
    |9999900031|9999900031/custom...|Just came back to...|    staff|        3|    null|ASE_OpeNER_round2|
    |9999900031|9999900031/custom...|The hotel was loc...|    noise|        2|    null|ASE_OpeNER_round2|
    +----------+--------------------+--------------------+---------+---------+--------+-----------------+
    >>> data.registerTempTable("temp")
    >>> df = sqlContext.sql("select first(item_id), review_id, first(text), concat_ws(';', collect_list(aspect)) as aspect from temp group by review_id")
    >>> df.show(3)
    +---------------------+--------------------+--------------------+--------------------+
    |first(item_id, false)|           review_id|  first(text, false)|              aspect|
    +---------------------+--------------------+--------------------+--------------------+
    |               100012|100012/tripadviso...|We stayed here la...|          staff;room| 
    |               100013|100013/tripadviso...|We stayed for two...|           breakfast|
    |               100031|100031/tripadviso...|We stayed two nig...|noise;breakfast;room|
    +---------------------+--------------------+--------------------+--------------------+
    

    and it works perfectly with the shell sqlContext variable.

    When I write it as a script:

    from pyspark import SparkContext
    from pyspark.sql import SparkSession, SQLContext
    
    sc = SparkContext(appName="AspectDetector")
    spark = SparkSession(sc)
    sqlContext = SQLContext(sc)
    data.registerTempTable("temp")
    df = sqlContext.sql("select first(item_id), review_id, first(text), concat_ws(';', collect_list(aspect)) as aspect from temp group by review_id")
    

    and run it I get the following:

    pyspark.sql.utils.AnalysisException: u'Table or view not found: temp; line 1 pos 99'

    How is that possible? Am I doing something wrong on the instatiation of sqlContext?

    • anshul_cached
      anshul_cached over 7 years
      Initialize it with Hive Context