What does "Heap Size" mean for Hadoop Namenode?

java hadoop mapreduce heap-memory

14,116

The namenode Web UI shows the values as this:

<h2>Cluster Summary (Heap Size is <%= StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %>/<%= StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %>)</h2>

The Runtime documents these as:

totalMemory() Returns the total amount of memory in the Java virtual machine.

maxMemory() Returns the maximum amount of memory that the Java virtual machine will attempt to use

Max is going to be the -Xmx parameter from the service start command. The total memory main factor is the number of blocks in your HDFS cluster. The namenode requires ~150 bytes for each block, +16 bytes for each replica, and it must be kept in live memory. So a default replication factor of 3 gives you 182 bytes, and you have 7534776 blocks gives about 1.3GB. Plus all other non-file related memory in use in the namenode, 1.95GB sounds about right. I would say that your HDFS cluster size requires a bigger namenode, more RAM. If possible, increase namenode startup -Xmx. If maxed out, you'll need a bigger VM/physical box.

Read The Small Files Problesm, HDFS-5711.

14,116

Author by

Bohdan

That's me. ;)

Updated on June 04, 2022

Comments

Bohdan almost 2 years
I'm trying to understand if there is something wrong with my Hadoop cluster. When I go to web UI in cluster summary it says:
```
Cluster Summary

XXXXXXX files and directories, XXXXXX blocks = 7534776 total.
Heap Size is 1.95 GB / 1.95 GB (100%) 
```
And I'm concerned about why is this Heap size metric at 100%

Could someone please provide some explanation how namenode heap size impact cluster performance. And whether this needs to be fixed.