Spark/PySpark: An error occurred while trying to connect to the Java server (127.0.0.1:39543)
If you are
using this one in Google Cloud:
custom (8 vCPUs, 200 GB)
then you significantly oversubscribe memory. Ignoring that spark.executor.memory
has no effect in local
mode.
spark.executor.memory
accounts only for JVM heap and doesn't cover:
- PySpark workers memory.
- PySpark driver memory.
Even with JVM only a part of it can be used for data processing (see Memory Management Overview) so spark.driver.maxResultSize
equal to the total assigned memory does not make sense.
qwertz
Updated on March 05, 2020Comments
-
qwertz about 4 years
Good afternoon,
In the last two days occurs many connection problems to the Java server. It´s a little bit uncommon because the error occurs not always, only sometimes...
I am using PySpark combined with Jupyter Notebook. Everything is running on a VM instance in the Google Cloud. I am using this one in Google Cloud:
custom (8 vCPUs, 200 GB)
These are the other settings:
conf = pyspark.SparkConf().setAppName("App") conf = (conf.setMaster('local[*]') .set('spark.executor.memory', '180G') .set('spark.driver.memory', '180G') .set('spark.driver.maxResultSize', '180G')) sc = pyspark.SparkContext(conf=conf) sq = pyspark.sql.SQLContext(sc)
I trained a Random Forest Model and made predictions:
model = rf.fit(train) predictions = model.transform(test)
Afterwards I created the ROC-Curve and compute the AUC-value.
Then I wanted to see the confusion matrix:
confusion_mat = metrics.confusionMatrix().toArray() print(confusion_mat_train_rf)
And now the error occurs:
Traceback (most recent call last): File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock self.process_request(request, client_address) File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request self.finish_request(request, client_address) File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python2.7/SocketServer.py", line 652, in __init__ self.handle() File "/usr/local/lib/python2.7/dist-packages/pyspark/accumulators.py", line 235, in handle num_updates = read_int(self.rfile) File "/usr/local/lib/python2.7/dist-packages/pyspark/serializers.py", line 577, in read_int raise EOFError EOFError ERROR:root:Exception while sending command. Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 883, in send_command response = connection.send_command(command) File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 1040, in send_command "Error while receiving", e, proto.ERROR_ON_RECEIVE) Py4JNetworkError: Error while receiving ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:39543) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) File "/usr/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused
Here is the output from the console:
OpenJDK 64-Bit Server VM warning : INFO: os::commit_memory(0x00007f4998300000, 603979776, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 603979776 bytes for committing reserved memory.
Logfile:
# # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 603979776 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # In 32 bit mode, the process size limit was hit # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full # Use 64 bit Java on a 64 bit OS # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize= # This output file may be truncated or incomplete. # # Out of Memory Error (os_linux.cpp:2643), pid=2377, tid=0x00007f1c94fac700 # # JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12) # Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 ) # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # --------------- S Y S T E M --------------- OS:DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS" uname:Linux 4.13.0-1008-gcp #11-Ubuntu SMP Thu Jan 25 11:08:44 UTC 2018 x86_64 libc:glibc 2.23 NPTL 2.23 rlimit: STACK 8192k, CORE 0k, NPROC 805983, NOFILE 1048576, AS infinity load average:7.69 4.51 3.57 /proc/meminfo: MemTotal: 206348252 kB MemFree: 1298460 kB MemAvailable: 250308 kB Buffers: 6812 kB Cached: 438232 kB SwapCached: 0 kB Active: 203906416 kB Inactive: 339540 kB Active(anon): 203804300 kB Inactive(anon): 8392 kB Active(file): 102116 kB Inactive(file): 331148 kB Unevictable: 3652 kB Mlocked: 3652 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 4688 kB Writeback: 0 kB AnonPages: 203805168 kB Mapped: 23076 kB Shmem: 8776 kB Slab: 114476 kB SReclaimable: 50640 kB SUnreclaim: 63836 kB KernelStack: 4752 kB PageTables: 404292 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 103174124 kB Committed_AS: 205956256 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 71628 kB DirectMap2M: 4122624 kB DirectMap1G: 207618048 kB CPU:total 8 (initial active 8) (4 cores per cpu, 2 threads per core) family 6 model 85 stepping 3, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx
Does anyone have any idea what the problem might be and how i can solve this? I am desperate. :(
// I think the Java Runtime Environment has not enough memory to continue... But what can i do?
Thank you very much!
-
qwertz about 6 yearsThanks for your objection. So what should i do now? Decrease the google cloud machine type to maybe 8 vCPUs, 52 GB? What would make sense not to oversubscribe the memory?