Spark SQL(PySpark) - SparkSession import Error
10,554
Probably the spark-submit
is pointing to another version of spark. Check what version of spark is used by spark-submit
using the following command:
spark-submit --version
If the spark-version is ok, then check what the PYTHONPATH
contains (echo $PYTHONPATH
), because it is posible that PYTHONPATH
has the pyspark library from another version of spark. If PYTHONPATH
doesn't contain the pyspark library, then add to it like this:
export PYTHONPATH=$PYTHONPATH:"$SPARK_HOME/python/lib/*"
Related videos on Youtube
Author by
AngiSen
Updated on June 04, 2022Comments
-
AngiSen over 1 year
I am trying to execute a simple Spark SQL code (PySpark) using Spark-Submit but received the below error. Note - I am running this in Spark 2.x.
spark-submit HousePriceSolution.py
Error:
from pyspark.sql import SparkSession ImportError: cannot import name SparkSession
Code:
from pyspark.sql import SparkSession PRICE_SQ_FT = "Price SQ Ft" if __name__ == "__main__": session = SparkSession.builder.appName("HousePriceSolution").getOrCreate() realEstate = session.read \ .option("header","true") \ .option("inferSchema", value=True) \ .csv("hdfs:............./RealEstate.csv") realEstate.groupBy("Location") \ .avg(PRICE_SQ_FT) \ .orderBy("avg(Price SQ FT)") \ .show() session.stop()