Pyspark: Exception: Java gateway process exited before sending the driver its port number

220,188

Solution 1

One possible reason is JAVA_HOME is not set because java is not installed.

I encountered the same issue. It says

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 51.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:296)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:406)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/spark/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/opt/spark/python/pyspark/context.py", line 243, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

at sc = pyspark.SparkConf(). I solved it by running

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

which is from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04

Solution 2

this should help you

One solution is adding pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS:

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell if a PYSPARK_SUBMIT_ARGS variable is set by a user.

Solution 3

Had this error message running pyspark on Ubuntu, got rid of it by installing the openjdk-8-jdk package

from pyspark import SparkConf, SparkContext
sc = SparkContext(conf=SparkConf().setAppName("MyApp").setMaster("local"))
^^^ error

Install Open JDK 8:

apt-get install openjdk-8-jdk-headless -qq    

On MacOS

Same on Mac OS, I typed in a terminal:

$ java -version
No Java runtime present, requesting install. 

I was prompted to install Java from the Oracle's download site, chose the MacOS installer, clicked on jdk-13.0.2_osx-x64_bin.dmg and after that checked that Java was installed

$ java -version
java version "13.0.2" 2020-01-14

EDIT To install JDK 8 you need to go to https://www.oracle.com/java/technologies/javase-jdk8-downloads.html (login required)

After that I was able to start a Spark context with pyspark.

Checking if it works

In Python:

from pyspark import SparkContext 
sc = SparkContext.getOrCreate() 

# check that it really works by running a job
# example from http://spark.apache.org/docs/latest/rdd-programming-guide.html#parallelized-collections
data = range(10000) 
distData = sc.parallelize(data)
distData.filter(lambda x: not x&1).take(10)
# Out: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Note that you might need to set the environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON and they have to be the same Python version as the Python (or IPython) you're using to run pyspark (the driver).

Solution 4

I use Mac OS. I fixed the problem!

Below is how I fixed it.

JDK8 seems works fine. (https://github.com/jupyter/jupyter/issues/248)

So I checked my JDK /Library/Java/JavaVirtualMachines, I only have jdk-11.jdk in this path.

I downloaded JDK8 (I followed the link). Which is:

brew tap caskroom/versions
brew cask install java8

After this, I added

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home
export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)"

to ~/.bash_profile file. (you sholud check your jdk1.8 file name)

It works now! Hope this help :)

Solution 5

I will repost how I solved it here just for future references.

How I solved my similar problem

Prerequisite:

  1. anaconda already installed
  2. Spark already installed (https://spark.apache.org/downloads.html)
  3. pyspark already installed (https://anaconda.org/conda-forge/pyspark)

Steps I did (NOTE: set the folder path accordingly to your system)

  1. set the following environment variables.
  2. SPARK_HOME to 'C:\spark\spark-3.0.1-bin-hadoop2.7'
  3. set HADOOP_HOME to 'C:\spark\spark-3.0.1-bin-hadoop2.7'
  4. set PYSPARK_DRIVER_PYTHON to 'jupyter'
  5. set PYSPARK_DRIVER_PYTHON_OPTS to 'notebook'
  6. add 'C:\spark\spark-3.0.1-bin-hadoop2.7\bin;' to PATH system variable.
  7. Change the java installed folder directly under C: (Previously java was installed under Program files, so I re-installed directly under C:)
  8. so my JAVA_HOME will become like this 'C:\java\jdk1.8.0_271'

now. it works !

Share:
220,188

Related videos on Youtube

mt88
Author by

mt88

Updated on March 28, 2022

Comments

  • mt88
    mt88 about 2 years

    I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:

    Exception: Java gateway process exited before sending the driver its port number
    

    when sc = SparkContext() is being called upon startup. I have tried running the following commands:

    ./bin/pyspark
    ./bin/spark-shell
    export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
    

    with no avail. I have also looked here:

    Spark + Python - Java gateway process exited before sending the driver its port number?

    but the question has never been answered. Please help! Thanks.

    • Bacon
      Bacon over 8 years
      What version of spark do you use?
    • Bacon
      Bacon over 8 years
      That might be helpful stackoverflow.com/a/30851037/296549
    • mt88
      mt88 over 8 years
      I resolved the above problem by downloading a different tarbll of spark.
    • Vinod Kumar Chauhan
      Vinod Kumar Chauhan over 8 years
    • zyxue
      zyxue about 8 years
      In addition to @mt88's comment, spark-1.6.0-bin-without-hadoop.tgz produces the above exception, but spark-1.6.0-bin-hadoop2.6.tgz doesn't for me on Ubuntu 15.04
    • lfk
      lfk about 6 years
      @zyxue I'm also using Spark 2.2.1 without Hadoop. Do you think that's the cause?
    • zyxue
      zyxue about 6 years
      @lfk, not sure of your question, you could just try and see
    • lfk
      lfk about 6 years
      @zyxue I'll have to build Spark in that case as the pre built version comes with 2.7 and I need 3
    • zyxue
      zyxue about 6 years
      @lfk, Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.2.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x)., spark.apache.org/docs/latest, so python 3 shouldn't be a problem. If you're working on a single node, then building it yourself is ok, but still I would recommend you to install it with some sort of a package manager, e.g. anaconda. The latest version pyspark, pypi.python.org/pypi/pyspark/2.1.1, it supports python3.
    • lfk
      lfk about 6 years
      @zyxue I'm sorry I meant Hadoop 3. That's why I'm using the Hadoop-free binaries. pip pyspark is prebuilt with Hadoop 2.7.
    • zyxue
      zyxue about 6 years
      @lfk, I am not familiar with Hadoop versions, to me, it's better to stick to what's being supported.
    • mikesneider
      mikesneider almost 4 years
      @mt88 have solved your problem? I have still the problem, I have Hadoop 2.7 and Spark 3.0.0
  • quax
    quax over 8 years
    Since I can't comment on user1613333 answer I do it here, I also found that using Anaconda makes things go much smoother.
  • asmaier
    asmaier over 6 years
    For a fully automatic installation of Oracle Java8 on Ubuntu see newfivefour.com/docker-java8-auto-install.html .
  • Gerald Senarclens de Grancy
    Gerald Senarclens de Grancy over 6 years
    Thx - this solved my problem on KDE Neon (based on Ubuntu 16.04).
  • lfk
    lfk about 6 years
    Where do I find that log file?
  • BallpointBen
    BallpointBen almost 6 years
    You can also set environment variables using import os; os.environ['PYSPARK_SUBMIT_ARGS'] = "--master ..."
  • F. R
    F. R almost 6 years
    this worked for me, after trying the EXPORT statements to no avail
  • QA Collective
    QA Collective about 5 years
    I'm confused. How can you not have java installed with an error like that? The first part of it is a java stack trace! I think you simply had the problem that JAVA_HOME wasn't set correctly/at all.
  • rishi jain
    rishi jain over 4 years
    how to install openjdk-8-jdk-headless on Windows?
  • Michele Piccolini
    Michele Piccolini over 4 years
    Where can one read up more about this pyspark-shell argument? I can't find it mentioned anywhere.
  • mikesneider
    mikesneider almost 4 years
    @quax I used 'os.environ['PYSPARK_SUBMIT_ARGS']= "--master spark://localhost:8888" ' but I do not know if I am doing wrong, still no works
  • mikesneider
    mikesneider almost 4 years
    but if you type java -version showed another version instead 8?
  • Denis G. Labrecque
    Denis G. Labrecque about 3 years
    What bash file where and why? WTH is /sbin/ (I'm on Windows)? @BallpointBen appreciate the addition, but same error as before.
  • quax
    quax about 3 years
    @Denis I'd try setting a Windows environment variable for PYSPARK_SUBMIT_ARGS. Can't help with the log location though on Windows. (Rarely touch Windows these days).
  • NonCreature0714
    NonCreature0714 over 2 years
    To clarify, the fix here is export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)" and there is an extended discussion on Github github.com/jupyter/jupyter/issues/248#issuecomment-926782387‌​. Yes, the link is to Jupyter, but it's regarding an issue with PySpark. Adding the first assignment to JAVA_HOME does nothing.
  • ChrisDanger
    ChrisDanger over 2 years
    boom, great answer
  • Ray
    Ray over 2 years
  • konsolas
    konsolas over 2 years
    This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. To get notified when this question gets new answers, you can follow this question. Once you have enough reputation, you can also add a bounty to draw more attention to this question. - From Review
  • Abubakar Saad
    Abubakar Saad over 2 years
    Thank you, this worked for me, I'm using Fedora. It's the JAVA path that has to be set to /usr/lib/jvm/java-(whatever JDK verison). This took time to figure out, thank you again
  • Matthaeus Gaius Caesar
    Matthaeus Gaius Caesar about 2 years
    This does not explain how to decide what IP and PORT to use. I just want to run pyspark locally on an existing database.