Spark job throwing "java.lang.OutOfMemoryError: GC overhead limit exceeded"
14,828
Only solution is to fine tune the configuration.
As per my experience I can say the following points for OOM:
- cache an RDD only if you are going to use it more than once
Still if you need to cache then consider then analyze the data and application with respect to resources.
- If your cluster has enough memory then increase the
spark.executor.memory
to its max - Increase the no of partitions to increase the parallelism
- Increase the dedicated memory for caching
spark.storage.memoryFraction
. If lot of shuffle memory is involved then try to avoid or split the allocation carefully - Spark's caching feature Persist(MEMORY_AND_DISK) is available at the cost of additional processing (serializing, writing and reading back the data). Usually CPU usage will be too high in this case
Author by
diplomaticguru
Updated on June 05, 2022Comments
-
diplomaticguru almost 2 years
I have a Spark job that throws "java.lang.OutOfMemoryError: GC overhead limit exceeded".
The job is trying to process a filesize 4.5G.
I've tried following spark configuration:
--num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G
I tried increasing more cores and executor which sometime works, but takes over 20 minutes to process the file.
Could I do something to improve the performance? or stop the Java Heap issue?
-
diplomaticguru almost 9 yearsI already tried increasing the driver-memory but no joy. There is no schedule delay. The job starts running within 5-10 seconds.