Spark: How to set spark.yarn.executor.memoryOverhead property in spark-submit

17,773

Solution 1

Please find example. The values can also be given in Sparkconf.

Example:

./bin/spark-submit \
--[your class] \
--master yarn \
--deploy-mode cluster \
--num-exectors 17
--conf spark.yarn.executor.memoryOverhead=4096 \
--executor-memory 35G \  //Amount of memory to use per executor process 
--conf spark.yarn.driver.memoryOverhead=4096 \
--driver-memory 35G \   //Amount of memory to be used for the driver process
--executor-cores 5
--driver-cores 5 \     //number of cores to use for the driver process 
--conf spark.default.parallelism=170
 /path/to/examples.jar

Solution 2

spark.yarn.executor.memoryOverhead has now been deprecated:

WARN spark.SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.


You can programmatically set spark.executor.memoryOverhead by passing it as a config:

spark = (
    SparkSession.builder
        .master('yarn')
        .appName('StackOverflow')
        .config('spark.driver.memory', '35g')
        .config('spark.executor.cores', 5)
        .config('spark.executor.memory', '35g')
        .config('spark.dynamicAllocation.enabled', True)
        .config('spark.dynamicAllocation.maxExecutors', 25)
        .config('spark.yarn.executor.memoryOverhead', '4096')
        .getOrCreate()
)
sc = spark.sparkContext
Share:
17,773
Micah Pearce
Author by

Micah Pearce

Updated on June 30, 2022

Comments

  • Micah Pearce
    Micah Pearce about 2 years

    In Spark 2.0. How do you set the spark.yarn.executor.memoryOverhead when you run spark submit.

    I know for things like spark.executor.cores you can set --executor-cores 2. Is it the same pattern for this property? e.g. --yarn-executor-memoryOverhead 4096