AM Container is running beyond virtual memory limits

15,340

Solution 1

Ok, found out. Increase the master memry parameter to more tan 750MB and you will succeed running the YARN app.

Solution 2

From the error message, you can see that you're using more virtual memory than your current limit of 1.0gb. This can be resolved in two ways:

Disable Virtual Memory Limit Checking

YARN will simply ignore the limit; in order to do this, add this to your yarn-site.xml:

<property>
  <name>yarn.nodemanager.vmem-check-enabled</name>
  <value>false</value>
  <description>Whether virtual memory limits will be enforced for containers.</description>
</property>

The default for this setting is true.

Increase Virtual Memory to Physical Memory Ratio

In your yarn-site.xml change this to a higher value than is currently set

<property>
  <name>yarn.nodemanager.vmem-pmem-ratio</name>
  <value>5</value>
  <description>Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.</description>
</property>

The default is 2.1

You could also increase the amount of physical memory you allocate to a container.

Make sure you don't forget to restart yarn after you change the config.

Solution 3

No need to change the cluster configuration. I found out that just providing the extra parameter

-Dmapreduce.map.memory.mb=4096

to distcp helped for me.

Solution 4

If you are running Tez framework, it is must to set the below parameters in Tez-site.xml

tez.am.resource.memory.mb
tez.task.resource.memory.mb
tez.am.java.opts

And in Yarn-site.xml

yarn.nodemanager.resource.memory-mb
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.vmem-check-enabled
yarn.nodemanager.vmem-pmem-ratio

All these parameters are mandatory to set

Share:
15,340
Jimson James
Author by

Jimson James

Programmer at heart, designer by brain. Immersed all in BigData. But neck-down in blockchain now.

Updated on June 17, 2022

Comments

  • Jimson James
    Jimson James almost 2 years

    I was playing with distributed shell application (hadoop-2.0.0-cdh4.1.2). This is the error I'm receiving at the moment.

    13/01/01 17:09:09 INFO distributedshell.Client: Got application report from ASM for, appId=5, clientToken=null, appDiagnostics=Application application_1357039792045_0005 failed 1 times due to AM Container for appattempt_1357039792045_0005_000001 exited with  exitCode: 143 due to: Container [pid=24845,containerID=container_1357039792045_0005_01_000001] is running beyond virtual memory limits. Current usage: 77.8mb of 512.0mb physical memory used; 1.1gb of 1.0gb virtual memory used. Killing container.
    Dump of the process-tree for container_1357039792045_0005_01_000001 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 24849 24845 24845 24845 (java) 165 12 1048494080 19590 /usr/java/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --num_containers 1 --priority 0 --shell_command ping --shell_args localhost --debug
    |- 24845 23394 24845 24845 (bash) 0 0 108654592 315 /bin/bash -c /usr/java/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --num_containers 1 --priority 0 --shell_command ping --shell_args localhost --debug 1>/tmp/logs/application_1357039792045_0005/container_1357039792045_0005_01_000001/AppMaster.stdout 2>/tmp/logs/application_1357039792045_0005/container_1357039792045_0005_01_000001/AppMaster.stderr 
    

    The interesting part is that, there seems to be no problem with the setup, since a simple ls or uname command completed successfully and the output was available in the container2 stdout.

    Regarding the setup, yarn.nodenamager.vmem-pmem-ratio is 3 and the total physical memory available is 2GB, which I thinks is more than enough for example to run.

    For the command in question, the "ping localhost" generated two replies, as it can be seen from the containerlogs/container_1357039792045_0005_01_000002/721917/stdout/?start=-4096.

    So, what could be the problem?

  • ykesh
    ykesh over 10 years
    It would be helpful to mention what parameter and any reference where you found that.
  • Mata
    Mata over 8 years
    this solution worked exactly what is outined by tmlye. Thanks a ton! Saved a lot of my time. Note - you must restart yarn stop-yarn.sh/start-yarn.sh
  • Andrew Logvinov
    Andrew Logvinov over 8 years
    Thanks, you saved my life :) I thought I'd have to change configuration and restart the whole cluster, but increasing memory worked in my case as well. Also running distcp.
  • David Ongaro
    David Ongaro over 8 years
    Glad it helped! I realize that the original question is not about distcp, but I think it's basically the same problem which should have the same solution.
  • Steve S
    Steve S almost 7 years
    Also helped me. Thanks!