What does "Heap Size" mean for Hadoop Namenode?

14,116

The namenode Web UI shows the values as this:

<h2>Cluster Summary (Heap Size is <%= StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %>/<%= StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %>)</h2>

The Runtime documents these as:

  • totalMemory() Returns the total amount of memory in the Java virtual machine.
  • maxMemory() Returns the maximum amount of memory that the Java virtual machine will attempt to use

Max is going to be the -Xmx parameter from the service start command. The total memory main factor is the number of blocks in your HDFS cluster. The namenode requires ~150 bytes for each block, +16 bytes for each replica, and it must be kept in live memory. So a default replication factor of 3 gives you 182 bytes, and you have 7534776 blocks gives about 1.3GB. Plus all other non-file related memory in use in the namenode, 1.95GB sounds about right. I would say that your HDFS cluster size requires a bigger namenode, more RAM. If possible, increase namenode startup -Xmx. If maxed out, you'll need a bigger VM/physical box.

Read The Small Files Problesm, HDFS-5711.

Share:
14,116
Bohdan
Author by

Bohdan

That's me. ;)

Updated on June 04, 2022

Comments

  • Bohdan
    Bohdan almost 2 years

    I'm trying to understand if there is something wrong with my Hadoop cluster. When I go to web UI in cluster summary it says:

    Cluster Summary
    
    XXXXXXX files and directories, XXXXXX blocks = 7534776 total.
    Heap Size is 1.95 GB / 1.95 GB (100%) 
    

    And I'm concerned about why is this Heap size metric at 100%

    Could someone please provide some explanation how namenode heap size impact cluster performance. And whether this needs to be fixed.