Boosting spark.yarn.executor.memoryOverhead
Solution 1
After a couple of hours I found the solution to this problem. When creating the cluster, I needed to pass on the following flag as a parameter:
--configurations file://./sparkConfig.json\
With the JSON file containing:
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.executor.memory": "10G"
}
}
]
This allows me to increase the memoryOverhead in the next step by using the parameter I initially posted.
Solution 2
If you are logged into an EMR node and want to further alter Spark's default settings without dealing with the AWSCLI tools you can add a line to the spark-defaults.conf
file. Spark is located in EMR's /etc directory. Users can access the file directly by navigating to or editing /etc/spark/conf/spark-defaults.conf
So in this case we'd append spark.yarn.executor.memoryOverhead
to the end of the spark-defaults file. The end of the file looks very similar to this example:
spark.driver.memory 1024M
spark.executor.memory 4305M
spark.default.parallelism 8
spark.logConf true
spark.executorEnv.PYTHONPATH /usr/lib/spark/python
spark.driver.maxResultSize 0
spark.worker.timeout 600
spark.storage.blockManagerSlaveTimeoutMs 600000
spark.executorEnv.PYTHONHASHSEED 0
spark.akka.timeout 600
spark.sql.shuffle.partitions 300
spark.yarn.executor.memoryOverhead 1000M
Similarly, the heap size can be controlled with the --executor-memory=xg
flag or the spark.executor.memory property
.
Hope this helps...
Comments
-
masta-g3 almost 2 years
I'm trying to run a (py)Spark job on EMR that will process a large amount of data. Currently my job is failing with the following error message:
Reason: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
So I google'd how to do this, and found that I should pass along the
spark.yarn.executor.memoryOverhead
parameter with the --conf flag. I'm doing it this way:aws emr add-steps\ --cluster-id %s\ --profile EMR\ --region us-west-2\ --steps Name=Spark,Jar=command-runner.jar,\ Args=[\ /usr/lib/spark/bin/spark-submit,\ --deploy-mode,client,\ /home/hadoop/%s,\ --executor-memory,100g,\ --num-executors,3,\ --total-executor-cores,1,\ --conf,'spark.python.worker.memory=1200m',\ --conf,'spark.yarn.executor.memoryOverhead=15300',\ ],ActionOnFailure=CONTINUE" % (cluster_id,script_name)\
But when I rerun the job it keeps giving me the same error message, with the
5.5 GB of 5.5 GB physical memory used
, which implies that my memory did not increase.. any hints on what I am doing wrong?EDIT
Here are details on how I initially create the cluster:
aws emr create-cluster\ --name "Spark"\ --release-label emr-4.7.0\ --applications Name=Spark\ --bootstrap-action Path=s3://emr-code-matgreen/bootstraps/install_python_modules.sh\ --ec2-attributes KeyName=EMR2,InstanceProfile=EMR_EC2_DefaultRole\ --log-uri s3://emr-logs-zerex\ --instance-type r3.xlarge\ --instance-count 4\ --profile EMR\ --service-role EMR_DefaultRole\ --region us-west-2'
Thanks.
-
kmh almost 6 yearsfwiw when I set spark.yarn.executor.memoryOverhead using a format like "1000M" I get a
java.lang.NumberFormatException: For input string: "1000M"
. spark documentation says this is "The amount of off-heap memory (in megabytes)"... so I leave off the 'M' and just set it as "1000" and it seems to work.