Java gateway process exited before sending its port number Spark

11,914

After one week looking for different ways to solve the exception showed, finally I found another tutorial, but this solved my question, the answer is Anaconda is the problem, the same variables and paths are the same. Then I install notebook python directly in my Windows (without Anaconda), now the issue was solved.

Share:
11,914
mikesneider
Author by

mikesneider

I'm a master student in system engineering. I'm working with satelital position with constellations GPS and GLONASS, i like math, stats and computer science.

Updated on August 03, 2022

Comments

  • mikesneider
    mikesneider over 1 year

    I am trying to install Spark in my Windows 10 with Anaconda, but I got an error when I try to runs pyspark in my JupyterNotebook. I am following the steps in this tutorial. Then, I already download Java 8 and install, Spark 3.0.0, Hadoop 2.7.

    I already set the paths for SPARK_HOME, JAVA_HOME, and include the '/bin' paths in the "PATH" environment.

    C:\Users\mikes>java -version
    java version "1.8.0_251"
    Java(TM) SE Runtime Environment (build 1.8.0_251-b08)
    Java HotSpot(TM) 64-Bit Server VM (build 25.251-b08, mixed mode)
    

    In PowerShell of Anaconda pyspark it works.

    (base) PS C:\Users\mikes> pyspark
    Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    20/06/05 07:14:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... 
    using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Welcome to
        ____              __
       / __/__  ___ _____/ /__
     _ \ \/ _ \/ _ `/ __/  '_/
    /__ / .__/\_,_/_/ /_/\_\   version 3.0.0-preview2
       /_/
    
    Using Python version 3.6.5 (default, Mar 29 2018 13:32:41)
    SparkSession available as 'spark'.
    >>>
    >>> nums = sc.parallelize([1,2,3,4])
    >>> nums.map(lambda x: x*x).collect()
    [1, 4, 9, 16]
    >>>           
    

    Netx step is runs pyspark in my Jupyter Notebook. I already install findspark then, my code for start in:

    import findspark
    findspark.init('c:\spark\spark-3.0.0-preview2-bin-hadoop2.7')
    #doesent work findspark.init() is necessary write the path.
    findspark.find()
    import pyspark
    from pyspark import SparkContext, SparkConf
    from pyspark.sql import SparkSession
    
    conf = pyspark.SparkConf().setAppName('appName').setMaster('local')
    sc = pyspark.SparkContext(conf=conf) #Here is the error
    spark = SparkSession(sc)
    

    The error that shows:

    ---------------------------------------------------------------------------
    Exception                                 Traceback (most recent call last)
    <ipython-input-6-c561ad39905c> in <module>()
          4 conf = pyspark.SparkConf().setAppName('appName').setMaster('local')
          5 sc = pyspark.SparkConf()
    ----> 6 sc = pyspark.SparkContext(conf=conf)
          7 spark = SparkSession(sc)
    
    c:\spark\spark-3.0.0-preview2-bin-hadoop2.7\python\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
        125                 " is not allowed as it is a security risk.")
        126 
    --> 127         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
        128         try:
        129             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
    
    c:\spark\spark-3.0.0-preview2-bin-hadoop2.7\python\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
        317         with SparkContext._lock:
        318             if not SparkContext._gateway:
    --> 319                 SparkContext._gateway = gateway or launch_gateway(conf)
        320                 SparkContext._jvm = SparkContext._gateway.jvm
        321 
    
    c:\spark\spark-3.0.0-preview2-bin-hadoop2.7\python\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
        103 
        104             if not os.path.isfile(conn_info_file):
    --> 105                 raise Exception("Java gateway process exited before sending its port number")
        106 
        107             with open(conn_info_file, "rb") as info:
    
    Exception: Java gateway process exited before sending its port number
    

    I saw another question similar to this one, but maybe the situation is another, because I already tried those solutions, as:

    -Set another party for PYSPARK_SUBMIT_ARGS, but I do not know if I a doing wrong.

    os.environ['PYSPARK_SUBMIT_ARGS']= "--master spark://localhost:8888"
    

    the other solutions are: - Set path for JAVA_HOME, SPARK_HOME (already did it) - Install Java 8 (not 10)

    I already spend some hours trying, even a reinstall Anaconda because I delete an environment.