What is the maximum number of files allowed in a HDFS directory?

15,269

Solution 1

The blocks and files are stored in a HashMap. So you are bound to Integer.MAX_VALUE. So a directory does not have any limitation, but the whole FileSystem.

Solution 2

In modern Apache Hadoop versions, various HDFS limits are controlled by configuration properties with fs-limits in the name, all which have reasonable default values. This question specifically asked about number of children in a directory. That's defined by dfs.namenode.fs-limits.max-directory-items, and its default value is 1048576.

Refer to the Apache Hadoop documentation in hdfs-default.xml for the full list of fs-limits configuration properties and their default values. Copy-pasting here for convenience:

<property>
  <name>dfs.namenode.fs-limits.max-component-length</name>
  <value>255</value>
  <description>Defines the maximum number of bytes in UTF-8 encoding in each
      component of a path.  A value of 0 will disable the check.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-directory-items</name>
  <value>1048576</value>
  <description>Defines the maximum number of items that a directory may
      contain. Cannot set the property to a value less than 1 or more than
      6400000.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.min-block-size</name>
  <value>1048576</value>
  <description>Minimum block size in bytes, enforced by the Namenode at create
      time. This prevents the accidental creation of files with tiny block
      sizes (and thus many blocks), which can degrade
      performance.</description>
</property>

<property>
    <name>dfs.namenode.fs-limits.max-blocks-per-file</name>
    <value>1048576</value>
    <description>Maximum number of blocks per file, enforced by the Namenode on
        write. This prevents the creation of extremely large files which can
        degrade performance.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-xattrs-per-inode</name>
  <value>32</value>
  <description>
    Maximum number of extended attributes per inode.
  </description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-xattr-size</name>
  <value>16384</value>
  <description>
    The maximum combined size of the name and value of an extended attribute
    in bytes. It should be larger than 0, and less than or equal to maximum
    size hard limit which is 32768.
  </description>
</property>

All of these settings use reasonable default values as decided upon by the Apache Hadoop community. It is generally recommended that users do not tune these values except in very unusual circumstances.

Solution 3

From http://blog.cloudera.com/blog/2009/02/the-small-files-problem/:

Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible.

Solution 4

This question specifically mentions HDFS, but a related question is how many files can you store on a Hadoop cluster.

That has a different answer if you use MapR's file system. In that case, billions of files can be stored on the cluster without a problem.

Share:
15,269
Joe Hansen
Author by

Joe Hansen

Check out my Developer Story.

Updated on June 11, 2022

Comments

  • Joe Hansen
    Joe Hansen almost 2 years

    What is the maximum number of files and directories allowed in a HDFS (hadoop) directory?

  • Praveen Sripati
    Praveen Sripati over 12 years
    But, the framework might not really scale to that number due to the s/w and the h/w constraints.
  • Ted Dunning
    Ted Dunning about 5 years
    I would have thought that as CTO of MapR, my own answer would be authoritative enough. In any case, check here [en.wikipedia.org/wiki/MapR_FS] or here [youtube.com/watch?v=AameZG88t58]
  • Abhinav
    Abhinav over 4 years
    Does it apply to the / itself? Suppose I set the property to 3, does it means that I can't have more than 3 files under /?