Spark job throwing "java.lang.OutOfMemoryError: GC overhead limit exceeded"

java garbage-collection apache-spark

14,828

Only solution is to fine tune the configuration.

As per my experience I can say the following points for OOM:

Still if you need to cache then consider then analyze the data and application with respect to resources.

If your cluster has enough memory then increase the spark.executor.memory to its max
Increase the no of partitions to increase the parallelism
Increase the dedicated memory for caching spark.storage.memoryFraction. If lot of shuffle memory is involved then try to avoid or split the allocation carefully
Spark's caching feature Persist(MEMORY_AND_DISK) is available at the cost of additional processing (serializing, writing and reading back the data). Usually CPU usage will be too high in this case

14,828

Author by

Updated on June 05, 2022

diplomaticguru almost 2 years
I have a Spark job that throws "java.lang.OutOfMemoryError: GC overhead limit exceeded".

The job is trying to process a filesize 4.5G.

I've tried following spark configuration:
```
--num-executors 6  --executor-memory 6G --executor-cores 6 --driver-memory 3G 
```
I tried increasing more cores and executor which sometime works, but takes over 20 minutes to process the file.

Could I do something to improve the performance? or stop the Java Heap issue?
diplomaticguru almost 9 years

I already tried increasing the driver-memory but no joy. There is no schedule delay. The job starts running within 5-10 seconds.