Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

17,605

Solution 1

You can always use profiler tools like visualVM. to monitor memory growth. Hopefully you are using 64 bit JVM and not 32 bit JVM. 32 bit process can use only use 2GB memory, so the memory setting essentially will be of no use. Hope this helps

Solution 2

Apart from Driver and Executor memory, would suggest to try following options: -

  1. Switch to Kryo Serialization - http://spark.apache.org/docs/latest/tuning.html#data-serialization
  2. Use MEMORY_AND_DISK_SER_2 for RDD persistence.

Also, would be good if you can post the code.

Solution 3

JVM options are not sufficient for configuring Spark memory, you also need to set spark.driver.memory (for driver, obv.) and spark.executor.memory (for workers). Those are set to 1gb per default. See this thorough guide for more information. Actually, I urge you to read it, there is a hell lot of stuff there and getting acquainted with it will definitely pay off later on.

Share:
17,605
wdz
Author by

wdz

Updated on July 25, 2022

Comments

  • wdz
    wdz almost 2 years

    I am using Spark Standalone single machine, with 128G memory and 32 cores. The following are settings I think relevant to my problem:

    spark.storage.memoryFraction     0.35
    spark.default.parallelism        50
    spark.sql.shuffle.partitions     50
    

    I have a Spark application in which there is a loop for 1000 devices. With each loop (device) it prepares feature vector and then calls k-Means of MLLib. At 25th to 30th iteration of loop (processing 25th to 30th device), it runs into the error of "Java.lang.OutOfMemoryError: Java heap space".

    I tried memoryFraction from 0.7 to 0.35, but it didn't help. I also tried parallelism/partitions to 200 with no luck. The JVM option is "-Xms25G -Xmx25G -XX:MaxPermSize=512m". My data size is only about 2G.

    Here is stack trace:

    java.lang.OutOfMemoryError: Java heap space
      at java.util.Arrays.copyOf(Arrays.java:2271)
      at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
      at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
      at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
      at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1841)
      at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
      at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
      at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
      at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
      at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:138)
      at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:136)
      at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
      at scala.collection.mutable.HashTable$class.serializeTo(HashTable.scala:125)
      at scala.collection.mutable.HashMap.serializeTo(HashMap.scala:40)
      at scala.collection.mutable.HashMap.writeObject(HashMap.scala:136)
      at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
      at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
      at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
      at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
      at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
      at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
      at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
      at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
      at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
    

    At the beginning, the application looks fine, but after it runs for a while and processes more and more devices, Java heap is occupied gradually and memory is not released by JVM. How to diagnose and fix such problem?

  • wdz
    wdz over 8 years
    Thanks for recommending visualVM. I'm trying it now to monitor memory usage. BTW, I'm using 64-bit JVM.
  • wdz
    wdz over 8 years
    In my setting, i have "spark.driver.memory 16G" and "spark.executor.memory 96G". I actually went through the Spark configuration doc but still couldn't fix my OOM problem. I'm trying visualVM now to see what happens in heap.
  • wdz
    wdz over 8 years
    Thanks for your suggestion! I already used Kryo serialization. Since MEMORY_AND_DISK_SER_2 is significantly slower than MEMORY_ONLY, I am reluctant to use it unless I don't have other choice.
  • Sumit
    Sumit over 8 years
    I agree but Objective of suggesting "MEMORY_AND_DISK_SER_2" was to first ensure that your job at-least completes. This will verify that your logic is correct though not optimized, then we can tune it for better performance so that it executes completely in-memory.
  • wdz
    wdz over 8 years
    I'll try the option MEMORY_AND_DISK_SER_2.
  • TheCodeCache
    TheCodeCache over 2 years
    you can try increasing --memoryOverhead option, generally 20 % of the executor memory, this solves some of the OOM issue, for sure