pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuild in windows 10

20,271

Solution 1

Port 9000?! It must be something Hadoop-related as I don't remember the port for Spark. I'd recommend using spark-shell first that would eliminate any additional "hops", i.e. spark-shell does not require two runtimes for Spark itself and Python.

Given the exception I'm pretty sure that the issue is that you've got some Hive- or Hadoop-related configuration somewhere lying around and Spark uses it apparently.

The "Caused by" seems to show that 9000 is used when Spark SQL is created which is when Hive-aware subsystem is loaded.

Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.net.ConnectException: Call From DESKTOP-SDNSD47/192.168.10.143 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused

Please review the environment variables in Windows 10 (possibly using set command on command line) and remove anything Hadoop-related.

Solution 2

I had the same problem using the command 'pyspark' as well as 'spark-shell' (for scala) in my mac os with apache-spark 2.2. Based on some research I figured its because of my JDK version 9.0.1 which does not work well with Apache-Spark. Both errors got resolved by switching back from Java JDK 9 to JDK 8.

Maybe that might help with your windows spark installation too.

Solution 3

Posting this answer for posterity. I faced the same error. The way i solved it is by first trying out spark-shell instead of pyspark. The error message was more direct.

This gave a better idea; there was S3 access error. Next; i checked the ec2 role/instance profile for that instance; it has S3 administrator access.

Then i did a grep for s3:// in all the conf files under /etc/ directory. Then i found that in core-site.xml there is a property called

<!-- URI of NN. Fully qualified. No IP.--> <name>fs.defaultFS</name> <value>s3://arvind-glue-temp/</value> </property>

Then i remembered. I had removed HDFS as the default file system and set it to S3. I had created the ec2 instance from an earlier AMI and had forgotten to update the S3 bucket corresponding to the newer account.

Once i updated the s3 bucket to the one which is accessible by the current ec2 instance profile; it worked.

Share:
20,271
Kalyan
Author by

Kalyan

Updated on July 09, 2022

Comments

  • Kalyan
    Kalyan over 1 year

    I have installed spark 2.2 with winutils in windows 10.when i am going to run pyspark i am facing bellow exception

    pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'
    

    I have already tried permission 777 commands in tmp/hive folder as well.but it is not working for now

    winutils.exe chmod -R 777 C:\tmp\hive
    

    after applying this the problem remains same. I am using pyspark 2.2 in my windows 10. Her is spark-shell env enter image description here

    Here is pyspark shell enter image description here

    Kindly help me to figure out Thankyou