Java gateway process exited before sending its port number Spark

python apache-spark pyspark

11,914

After one week looking for different ways to solve the exception showed, finally I found another tutorial, but this solved my question, the answer is Anaconda is the problem, the same variables and paths are the same. Then I install notebook python directly in my Windows (without Anaconda), now the issue was solved.

11,914

Author by

mikesneider

I'm a master student in system engineering. I'm working with satelital position with constellations GPS and GLONASS, i like math, stats and computer science.

Updated on August 03, 2022

Comments

mikesneider over 1 year

I am trying to install Spark in my Windows 10 with Anaconda, but I got an error when I try to runs pyspark in my JupyterNotebook. I am following the steps in this tutorial. Then, I already download Java 8 and install, Spark 3.0.0, Hadoop 2.7.

I already set the paths for SPARK_HOME, JAVA_HOME, and include the '/bin' paths in the "PATH" environment.

C:\Users\mikes>java -version
java version "1.8.0_251"
Java(TM) SE Runtime Environment (build 1.8.0_251-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.251-b08, mixed mode)

In PowerShell of Anaconda pyspark it works.

(base) PS C:\Users\mikes> pyspark
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
20/06/05 07:14:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
    ____              __
   / __/__  ___ _____/ /__
 _ \ \/ _ \/ _ `/ __/  '_/
/__ / .__/\_,_/_/ /_/\_\   version 3.0.0-preview2
   /_/

Using Python version 3.6.5 (default, Mar 29 2018 13:32:41)
SparkSession available as 'spark'.
>>>
>>> nums = sc.parallelize([1,2,3,4])
>>> nums.map(lambda x: x*x).collect()
[1, 4, 9, 16]
>>>

Netx step is runs pyspark in my Jupyter Notebook. I already install findspark then, my code for start in:

import findspark
findspark.init('c:\spark\spark-3.0.0-preview2-bin-hadoop2.7')
#doesent work findspark.init() is necessary write the path.
findspark.find()
import pyspark
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

conf = pyspark.SparkConf().setAppName('appName').setMaster('local')
sc = pyspark.SparkContext(conf=conf) #Here is the error
spark = SparkSession(sc)

The error that shows:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-6-c561ad39905c> in <module>()
      4 conf = pyspark.SparkConf().setAppName('appName').setMaster('local')
      5 sc = pyspark.SparkConf()
----> 6 sc = pyspark.SparkContext(conf=conf)
      7 spark = SparkSession(sc)

c:\spark\spark-3.0.0-preview2-bin-hadoop2.7\python\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    125                 " is not allowed as it is a security risk.")
    126 
--> 127         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    128         try:
    129             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

c:\spark\spark-3.0.0-preview2-bin-hadoop2.7\python\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
    317         with SparkContext._lock:
    318             if not SparkContext._gateway:
--> 319                 SparkContext._gateway = gateway or launch_gateway(conf)
    320                 SparkContext._jvm = SparkContext._gateway.jvm
    321 

c:\spark\spark-3.0.0-preview2-bin-hadoop2.7\python\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
    103 
    104             if not os.path.isfile(conn_info_file):
--> 105                 raise Exception("Java gateway process exited before sending its port number")
    106 
    107             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

I saw another question similar to this one, but maybe the situation is another, because I already tried those solutions, as:

-Set another party for PYSPARK_SUBMIT_ARGS, but I do not know if I a doing wrong.

os.environ['PYSPARK_SUBMIT_ARGS']= "--master spark://localhost:8888"

the other solutions are: - Set path for JAVA_HOME, SPARK_HOME (already did it) - Install Java 8 (not 10)

I already spend some hours trying, even a reinstall Anaconda because I delete an environment.

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

AssertionError: all exprs should be Column

Pyspark count() and collect() do not work

How to use correlation in Spark with Dataframes?

PySpark: match the values of a DataFrame column against another DataFrame column

How to run a function on all Spark workers before processing data in PySpark?

How to resolve pickle error in pyspark?

PySpark logging from the executor

with pyspark.sql.functions unix_timestamp get null

How to map features from the output of a VectorAssembler back to the column names in Spark ML?

Save and load two ML models in pyspark

Java gateway process exited before sending its port number Spark

mikesneider

Comments

Recents

Related