Spark SQL(PySpark) - SparkSession import Error

10,554

Probably the spark-submit is pointing to another version of spark. Check what version of spark is used by spark-submit using the following command:

spark-submit --version

If the spark-version is ok, then check what the PYTHONPATH contains (echo $PYTHONPATH), because it is posible that PYTHONPATH has the pyspark library from another version of spark. If PYTHONPATH doesn't contain the pyspark library, then add to it like this:

export PYTHONPATH=$PYTHONPATH:"$SPARK_HOME/python/lib/*"
Share:
10,554

Related videos on Youtube

AngiSen
Author by

AngiSen

Updated on June 04, 2022

Comments

  • AngiSen
    AngiSen over 1 year

    I am trying to execute a simple Spark SQL code (PySpark) using Spark-Submit but received the below error. Note - I am running this in Spark 2.x.

    spark-submit HousePriceSolution.py

    Error:

    from pyspark.sql import SparkSession ImportError: cannot import name SparkSession

    Code:

     from pyspark.sql import SparkSession
     PRICE_SQ_FT = "Price SQ Ft"
    
     if __name__ == "__main__":
    
      session = SparkSession.builder.appName("HousePriceSolution").getOrCreate()    
      realEstate = session.read \
      .option("header","true") \
      .option("inferSchema", value=True) \
      .csv("hdfs:............./RealEstate.csv")
    
      realEstate.groupBy("Location") \
      .avg(PRICE_SQ_FT) \
      .orderBy("avg(Price SQ FT)") \
      .show()
      session.stop()