Spark java.lang.OutOfMemoryError: Java heap space
Solution 1
I have a few suggestions:
- If your nodes are configured to have 6g maximum for Spark (and are leaving a little for other processes), then use 6g rather than 4g,
spark.executor.memory=6g
. Make sure you're using as much memory as possible by checking the UI (it will say how much mem you're using) - Try using more partitions, you should have 2 - 4 per CPU. IME increasing the number of partitions is often the easiest way to make a program more stable (and often faster). For huge amounts of data you may need way more than 4 per CPU, I've had to use 8000 partitions in some cases!
- Decrease the fraction of memory reserved for caching, using
spark.storage.memoryFraction
. If you don't usecache()
orpersist
in your code, this might as well be 0. It's default is 0.6, which means you only get 0.4 * 4g memory for your heap. IME reducing the mem frac often makes OOMs go away. UPDATE: From spark 1.6 apparently we will no longer need to play with these values, spark will determine them automatically. - Similar to above but shuffle memory fraction. If your job doesn't need much shuffle memory then set it to a lower value (this might cause your shuffles to spill to disk which can have catastrophic impact on speed). Sometimes when it's a shuffle operation that's OOMing you need to do the opposite i.e. set it to something large, like 0.8, or make sure you allow your shuffles to spill to disk (it's the default since 1.0.0).
- Watch out for memory leaks, these are often caused by accidentally closing over objects you don't need in your lambdas. The way to diagnose is to look out for the "task serialized as XXX bytes" in the logs, if XXX is larger than a few k or more than an MB, you may have a memory leak. See https://stackoverflow.com/a/25270600/1586965
- Related to above; use broadcast variables if you really do need large objects.
- If you are caching large RDDs and can sacrifice some access time consider serialising the RDD http://spark.apache.org/docs/latest/tuning.html#serialized-rdd-storage. Or even caching them on disk (which sometimes isn't that bad if using SSDs).
- (Advanced) Related to above, avoid
String
and heavily nested structures (likeMap
and nested case classes). If possible try to only use primitive types and index all non-primitives especially if you expect a lot of duplicates. ChooseWrappedArray
over nested structures whenever possible. Or even roll out your own serialisation - YOU will have the most information regarding how to efficiently back your data into bytes, USE IT! - (bit hacky) Again when caching, consider using a
Dataset
to cache your structure as it will use more efficient serialisation. This should be regarded as a hack when compared to the previous bullet point. Building your domain knowledge into your algo/serialisation can minimise memory/cache-space by 100x or 1000x, whereas all aDataset
will likely give is 2x - 5x in memory and 10x compressed (parquet) on disk.
http://spark.apache.org/docs/1.2.1/configuration.html
EDIT: (So I can google myself easier) The following is also indicative of this problem:
java.lang.OutOfMemoryError : GC overhead limit exceeded
Solution 2
To add a use case to this that is often not discussed, I will pose a solution when submitting a Spark
application via spark-submit
in local mode.
According to the gitbook Mastering Apache Spark by Jacek Laskowski:
You can run Spark in local mode. In this non-distributed single-JVM deployment mode, Spark spawns all the execution components - driver, executor, backend, and master - in the same JVM. This is the only mode where a driver is used for execution.
Thus, if you are experiencing OOM
errors with the heap
, it suffices to adjust the driver-memory
rather than the executor-memory
.
Here is an example:
spark-1.6.1/bin/spark-submit
--class "MyClass"
--driver-memory 12g
--master local[*]
target/scala-2.10/simple-project_2.10-1.0.jar
Solution 3
You should configure offHeap memory settings as shown below:
val spark = SparkSession
.builder()
.master("local[*]")
.config("spark.executor.memory", "70g")
.config("spark.driver.memory", "50g")
.config("spark.memory.offHeap.enabled",true)
.config("spark.memory.offHeap.size","16g")
.appName("sampleCodeForReference")
.getOrCreate()
Give the driver memory and executor memory as per your machines RAM availability. You can increase the offHeap size if you are still facing the OutofMemory issue.
Solution 4
You should increase the driver memory. In your $SPARK_HOME/conf folder you should find the file spark-defaults.conf
, edit and set the spark.driver.memory 4000m
depending on the memory on your master, I think.
This is what fixed the issue for me and everything runs smoothly
Solution 5
Have a look at the start up scripts a Java heap size is set there, it looks like you're not setting this before running Spark worker.
# Set SPARK_MEM if it isn't already set since we also use it for this process
SPARK_MEM=${SPARK_MEM:-512m}
export SPARK_MEM
# Set JAVA_OPTS to be able to load native libraries and to set heap size
JAVA_OPTS="$OUR_JAVA_OPTS"
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"
You can find the documentation to deploy scripts here.
Related videos on Youtube
Hellen
Updated on July 10, 2022Comments
-
Hellen almost 2 years
My cluster: 1 master, 11 slaves, each node has 6 GB memory.
My settings:
spark.executor.memory=4g, Dspark.akka.frameSize=512
Here is the problem:
First, I read some data (2.19 GB) from HDFS to RDD:
val imageBundleRDD = sc.newAPIHadoopFile(...)
Second, do something on this RDD:
val res = imageBundleRDD.map(data => { val desPoints = threeDReconstruction(data._2, bg) (data._1, desPoints) })
Last, output to HDFS:
res.saveAsNewAPIHadoopFile(...)
When I run my program it shows:
..... 14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Starting task 1.0:24 as TID 33 on executor 9: Salve7.Hadoop (NODE_LOCAL) 14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Serialized task 1.0:24 as 30618515 bytes in 210 ms 14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Starting task 1.0:36 as TID 34 on executor 2: Salve11.Hadoop (NODE_LOCAL) 14/01/15 21:42:28 INFO cluster.ClusterTaskSetManager: Serialized task 1.0:36 as 30618515 bytes in 449 ms 14/01/15 21:42:28 INFO cluster.ClusterTaskSetManager: Starting task 1.0:32 as TID 35 on executor 7: Salve4.Hadoop (NODE_LOCAL) Uncaught error from thread [spark-akka.actor.default-dispatcher-3] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[spark] java.lang.OutOfMemoryError: Java heap space
There are too many tasks?
PS: Every thing is ok when the input data is about 225 MB.
How can I solve this problem?
-
Tombart over 10 yearshow do run spark? is it from console? or which deploy scripts do you use?
-
Hellen over 10 yearsI use sbt to compile and run my app. sbt package then sbt run. I implemented the same program on hadoop a month ago , and I met the same problem of OutOfMemoryError, but in hadoop it can be easily solved by increasing the value of mapred.child.java.opts from Xmx200m to Xmx400m. Does spark have any jvm setting for it's tasks?I wonder if spark.executor.memory is the same meaning like mapred.child.java.opts in hadoop. In my program spark.executor.memory has already been setted to 4g much bigger than Xmx400m in hadoop. Thank you~
-
Arnon Rotem-Gal-Oz over 9 yearsAre the three steps you mention the only ones you do? What's the size of the dataa generated by (data._1, desPoints) - this should fit in memory esp if this data is then shuffled to another stage
-
RanP over 8 yearsWhat is the memory configuration for the driver? Check which server get the out of memory error. Is it the driver or one of the executors.
-
Naramsim over 7 yearsSee here all configurations properties: spark.apache.org/docs/2.1.0/configuration.html
-
Naman over 3 years@hequn8128 It would be great for future references, if you could accept an answer here, I liked the one from samthebest personally to be honest.
-
-
Hellen over 10 yearsThank you~ I will try later. From spark ui, it shows the memory of every executor is 4096. So the setting has been enabled, right?
-
Hellen about 10 yearsThanks for your suggestions~ If I set spark.executor.memory=6g, spark will have the problem:"check your cluster UI to ensure that workers are registered and have sufficient memory". Setting spark.storage.memoryFraction to 0.1 can't solve the problem either. Maybe the problem lies in my code.Thank you!
-
Jacek L. about 10 years@hequn8128, spark executor memory must fit you spark worker memory
-
Myles Baker about 9 years@samthebest This is a fantastic answer. I really appreciate the logging help for finding memory leaks.
-
Umesh K almost 9 yearsHi @samthebest how did you specify 8000 partitions? Since I am using Spark sql I can only specify partition using spark.sql.shuffle.partitions, default value is 200 should I set it to more I tried to set it to 1000 but not helping getting OOM are you aware what should be the optimal partition value I have 1 TB skewed data to process and it involves group by hive queries. Please guide.
-
samthebest almost 9 yearsHi @user449355 please could you ask a new question? For fear of starting a long a comment thread :) If you are having issues, likely other people are, and a question would make it easier to find for all.
-
Seffy over 8 yearsSaw your answer while I'm facing similar issue (stackoverflow.com/questions/34762432/…). Looking the link you provided looks like setting Xms/Xmx is not there anymore, can you tell why?
-
Alex Naspo about 8 years
-
makansij almost 8 yearsTo your first point, @samthebest, you should not use ALL the memory for
spark.executor.memory
because you definitely need some amount of memory for I/O overhead. If you use all of it, it will slow down your program. The exception to this might be Unix, in which case you have swap space. -
makansij almost 8 yearsHow do you know which one to adjust between
SPARK_EXECUTOR_MEMORY
&SPARK_DRIVER_MEMORY
? -
makansij almost 8 yearsi.e. what error would tell you to increase the
SPARK_EXECUTOR_MEMORY
, and what error would tell you to increaseSPARK_DRIVER_MEMORY
? -
samthebest almost 8 years@Hunle Yes one needs to allow for other processes on the node. I'll update my answer.
-
y2k-shubham over 6 yearsIn
Spark 2.3.0
spark.storage.memoryFraction
&spark.shuffle.memoryFraction
are deprecated & used only inlegacyMode
. Confusingly enough, the alternative tospark.storage.memoryFraction
isspark.memory.storageFraction
(!) while shuffle now has it's own dedicated set of configurations & needs to be enabled perexecutor
basis -
samthebest over 6 years@y2k-shubham Yes since 1.6 it's supposed to automagically set these. I actually don't like this as it can be a little harder to know when an OOM is due to a shuffle vs cache vs heap. Sometimes the ST is enough. Does spark have any way it reports how much memory is reserved for each?
-
kennyut over 5 yearsAdded offHeap setting helped
-
Abdulhafeth Sartawi over 5 yearssetting the driver memory in your code will not work, read spark documentation for this: Spark properties mainly can be divided into two kinds: one is related to deploy, like “spark.driver.memory”, “spark.executor.instances”, this kind of properties may not be affected when setting programmatically through SparkConf in runtime, or the behavior is depending on which cluster manager and deploy mode you choose, so it would be suggested to set through configuration file or spark-submit command line options.
-
Mikhail_Sam over 5 yearsTHE BEST ANSWER! My problem was that Spark wasn't installed at master node, I just used PySpark to connect to HDFS and got the same error. Using
config
solved the problem. -
sgu over 5 yearsHow can we tell whether an OOM is due to shuffle or not? Also, when OOM occurs, is there a way to see how much memory requested vs. how much is available?
-
merenptah over 5 yearsYou should use --conf spark.driver.memory=18g
-
Yashwanth Kambala over 4 yearsHow much percentage we should be considering for driver memory in stand-alone mode.
-
Yashwanth Kambala over 4 yearsHow much percentage of mem to be alloted, in stand alone
-
David Groomes over 4 yearsThe content at the script linked to by
start up scripts
has changed unfortunately. No such options exist as of 2019-12-19 -
Pritam Sadhukhan about 4 yearsI just added the configurations using spark-submit command to fix the heap size issue. Thanks.
-
fuyi about 4 years@Brian, In local mode, does the driver memory need to be larger than the input data size? Is it possible to specify number of partitions for input dataset, so the Spark job can deal with dataset much larger than the available RAM?
-
whatsinthename almost 4 yearsDriver memory can't be larger than the input size. Consider you have a 160gb file to be loaded into your cluster. so, for that, you would create a driver with 161 GB? that's not feasible. Its how you determine the number of executors, their memory, and the buffer for overhead memory and their OS. You need to calculate all these things by seeing the yarn UI and the cluster memory given to you. For better performance, you also need to consider the executor-cores which should be always between 3-5 @fuyi
-
bsauce over 2 yearsAdditional suggestion for anyone getting this OOM when using pandas UDFs: ensure that pyarrow is enabled (it is not enabled by default!).
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
(More info here).