What is the most high-performance Linux filesystem for storing a lot of small files (HDD, not SSD)?

96,133

Solution 1

Performance

I wrote a small Benchmark (source), to find out, what file system performs best with hundred thousands of small files:

  • create 300000 files (512B to 1536B) with data from /dev/urandom
  • rewrite 30000 random files and change the size
  • read 30000 sequential files
  • read 30000 random files
  • delete all files

  • sync and drop cache after every step

Results (average time in seconds, lower = better):

Using Linux Kernel version 3.1.7
Btrfs:
    create:    53 s
    rewrite:    6 s
    read sq:    4 s
    read rn:  312 s
    delete:   373 s

ext4:
    create:    46 s
    rewrite:   18 s
    read sq:   29 s
    read rn:  272 s
    delete:    12 s

ReiserFS:
    create:    62 s
    rewrite:  321 s
    read sq:    6 s
    read rn:  246 s
    delete:    41 s

XFS:
    create:    68 s
    rewrite:  430 s
    read sq:   37 s
    read rn:  367 s
    delete:    36 s

Result:
While Ext4 had good overall performance, ReiserFS was extreme fast at reading sequential files. It turned out that XFS is slow with many small files - you should not use it for this use case.

Fragmentation issue

The only way to prevent file systems from distributing files over the drive, is to keep the partition only as big as you really need it, but pay attention not to make the partition too small, to prevent intrafile-fragmenting. Using LVM can be very helpful.

Further reading

The Arch Wiki has some great articles dealing with file system performance:

https://wiki.archlinux.org/index.php/Beginner%27s_Guide#Filesystem_types

https://wiki.archlinux.org/index.php/Maximizing_Performance#Storage_devices

Solution 2

I am using ReiserFS for this task, it is especially made for handling a lot of small files. There is an easy to read text about it at the funtoo wiki.

ReiserFS also has a host of features aimed specifically at improving small file performance. Unlike ext2, ReiserFS doesn't allocate storage space in fixed one k or four k blocks. Instead, it can allocate the exact size it needs.

Solution 3

The ext4 performance drops off after 1-2 million files in a directory. See this page http://genomewiki.ucsc.edu/index.php/File_system_performance created by Hiram Clawson at UCSC

Share:
96,133

Related videos on Youtube

Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin over 1 year

    I have a directory tree that contains many small files, and a small number of larger files. The average size of a file is about 1 kilobyte. There are 210158 files and directories in the tree (this number was obtained by running find | wc -l).

    A small percentage of files gets added/deleted/rewritten several times per week. This applies to the small files, as well as to the (small number of) larger files.

    The filesystems that I tried (ext4, btrfs) have some problems with positioning of files on disk. Over a longer span of time, the physical positions of files on the disk (rotating media, not solid state disk) are becoming more randomly distributed. The negative consequence of this random distribution is that the filesystem is getting slower (such as: 4 times slower than a fresh filesystem).

    Is there a Linux filesystem (or a method of filesystem maintenance) that does not suffer from this performance degradation and is able to maintain a stable performance profile on a rotating media? The filesystem may run on Fuse, but it needs to be reliable.

    • BrettRobi
      BrettRobi over 12 years
      If you know which files are going to be big/not changing very often, and which are going to be small/frequently changing, you might want to create two filesystems with different options on them, more suited to each scenario. If you need them to be accessible as they were a part of the same structure, you can do some tricks with mount, symlinks.
    • Nikhil Mulley
      Nikhil Mulley over 12 years
      I am quiet surprised to know that btrfs(with copy-on-write feature) has been sluggish to you over a period of time. I am curious to have the results shared from you, possibly helping each other in new direction of performance tuning with it.
    • Nikhil Mulley
      Nikhil Mulley over 12 years
      there is a new animal online zfs on Linux, available in native mode and fuse implementations, incase you wanted to have a look.
    • phemmer
      phemmer over 12 years
      I tried zfs on linux once, was quite unstable. Managed to completely lock up the filesystem quite often. Box would work, but any access to the FS would hang.
    • Nikhil Mulley
      Nikhil Mulley over 12 years
    • Nikhil Mulley
      Nikhil Mulley over 12 years
      @Patrick yeah, I see those solutions are still naive and it will be sometime to see them performing native.
    • psusi
      psusi over 12 years
      What does it matter whether they are spread out or not? Even if they are stored one after the other, if you are accessing a small subset of the files randomly, then you will still get a random IO pattern. For a sequential access like taring the whole thing up, from what I have seen, btrfs handles this best. ext4 is bad at it because it stores the file names in hash order, which is essentially random, so even if the file data is all in order, tar reads them in a random order. btrfs does a very good job of keeping them in order. Running a btrfs fi defrag every now and again helps too.
    • Admin
      Admin over 12 years
      @psusi It seems it was a mistake to use the word "fragmentation" in my question. I just replaced it with "positioning of files on disk". I apologize.
    • Admin
      Admin over 9 years
      XFS has improved greatly since 5 years ago in the area of small files, I suspect the numbers above would be very different for XFS in newer Linux distros now.
    • ctrl-alt-delor
      ctrl-alt-delor over 5 years
      I was reading (a few months back) that COW file-systems, fragment more that non-COW. Apparently this is a property of COW. Therefore if using COW, one should run a de-fragmenter. If COW is not needed, then it COW file-systems should be avoided.
  • Nils
    Nils over 12 years
    There are stability issues as well with ReiserFS - so RH and SuSE have dropped that FS. From the principle (BTree-based-FS) BTRFS should be comparable.
  • phemmer
    phemmer over 12 years
    You should specify what version of the kernel youre basing that comparison off of. XFS got some very significant speed imporovments in one of the recent kernels (think it was 2.6.31, but dont quote me on that).
  • phemmer
    phemmer over 12 years
    Other note, having a UPS is vital on any filesystem not running in synchronous IO mode. Ext4, reiserfs, and btrfs would all suffer data loss without a proper shutdown.
  • taffer
    taffer over 12 years
    @Patrick Kernel: look here even though XFS performance has been improved, it's still slow with small files (DBench uses a filesize of 5B). XFS was intentionally made for CGI with huge partitions and files that are > 4MB, not for small 1KB files.
  • taffer
    taffer over 12 years
    @Patrick UPS: There is a big d
  • taffer
    taffer over 12 years
    @Patrick UPS: There is a big difference on how much data you will loose. While XFS will show data that might be corrupted as \0, even whole directories, ext4 will have added garbage to a directory or file (unless in data=journal mode). Btrfs in contrast uses transactional writes, so that you will at least end up with the last non corrupted version of a file in case of a power outage. Also note that many people do not have UPS outside datacenters.
  • psusi
    psusi over 12 years
    btrfs internally does your lvm trick. It allocates smaller chunks of the disk and places files in those chunks, then only allocates another chunk of the disk when the existing chunks fill up.
  • psusi
    psusi over 12 years
    @taffer, ext4 will not have added garbage to a directory or file after a power failure in the default data=ordered mode. Even in data=writeback, directories will not be corrupted.
  • taffer
    taffer over 12 years
    @psusi btrfs: shure, when you use btrfs, you don't need lvm at all.
  • taffer
    taffer over 12 years
    @psusi ext4: yes, my fault, the garbage part is only true for data=writeback and doesn't affect directories at all. nevertheless iirc files might be uncomplete in data=ordered mode when power outage happens while data is written.
  • psusi
    psusi over 12 years
    That's true of any filesystem. That is why applications use things like fsync().
  • taffer
    taffer over 12 years
    @psusi as far as i know thats not true of transactions in btrfs.
  • psusi
    psusi over 12 years
    @taffer, it is. The transactions have the same effect as the journal does in other filesystems: they protect the fs metadata. In theory they can be used by applications in the way you describe, but there is currently no api to allow applications to open and close transactions.
  • taffer
    taffer over 12 years
    @psusi OK, thanks for the clarification. Do you know of any small-file benchmarks comparing xfs, ext4 and reiserfs?
  • psusi
    psusi over 12 years
    @taffer, nope, but I do know that linear access ( like tar ) to many small files ( like a Maildir ) on ext4 is slow as all hell because ext4 stores the file names in hash order to speed up locating an individual file, and that effectively randomizes the order of file names relative to where they are stored on disk. I had tar take 30 minutes to archive a Maildir on ext4 that took 3 minutes on btrfs. See askubuntu.com/q/29831/8500
  • taffer
    taffer over 12 years
    @psusi Maybe you are interested in the benchmark I have made. I also took into account the linear access you have mentioned. Unforunatly I didn't have the time to benchmark btrfs and jfs too.
  • psusi
    psusi over 12 years
    Nice, but those times seem pretty fast. Are you sure you dropped cache, or is this on an SSD or something?
  • taffer
    taffer over 12 years
    @psusi Yes, echo 3 > /proc/sys/vm/drop_caches after every step should have done it. I was using a 7200 rpm HDD with a Xeon W3550 processor. There is a link to the source of the benchmark in my answer, so you can try it yourself and improve it if you like.
  • phemmer
    phemmer almost 10 years
    @Levit I think you're misreading that report. The report very clearly shows that XFS performs very well for random IO. But that aside, the report does not address the type of scenario in this question, lots of files. Random IO is one thing, large numbers of files is where ext* falls on its face.
  • Levite
    Levite almost 10 years
    Sry, I don't wanna make a discussion here, but if you look through the benches you see that pattern or just look at the closing statement on page 10 of this article stating for large file installation (as VM hosting system) or direct I/O, go with XFS .... which is where it really shines. I don't say it is extremely terrible with small files, but surely not ideal for it!
  • phemmer
    phemmer almost 10 years
    @Levit See page 4 & page 7. You should always look at the data someone uses to arrive at their conclusion, and never trust the conclusion by itself.
  • Levite
    Levite almost 10 years
    The only place XFS is really better there are the random read/write operations (still seems strange that a truly random read pattern on a mechanical disk is able to get 10MB/s - seems to me like some optimization that does not fly in real world (imho)), whereas on page 7 it shows just what I said earlier, XFS is really good in handling big files! Look at pages 3&5, esp on 3 you see it handling small files clearly not as well as ext! I really don't have anything against XFS though, but from what you find just about everywhere, it is not the best optiom for many small files, is all I am saying!
  • invot
    invot almost 9 years
    @taffer You write: "keep the partition only as big as you really need it". I think this is misleading. You cannot prevent copy-on-write filesystems (BTRFS/ZFS) to fragment a bit over time. A rule of thumb says, filling them permanently beyond 50% guarantees, that you will encounter serious fragmentation issues sooner or later. YMMV, some see this at 70%, some already at 25%. And even it is caused by something different, the same is more or less true for ext-type FS as well. (But: I fully agree with what you write about XFS.)
  • Jonas Schäfer
    Jonas Schäfer almost 7 years
    When you use LVM to gradually grow the FS, you shift the fragmentation issue one layer lower, which may be worse than having the FS fragment (and know about it).
  • Jody Bruchon
    Jody Bruchon almost 6 years
    This info is 6 years old as of this comment. XFS now includes metadata CRCs, free inode B+ trees (finobt) for better performance on "aged" filesystems, and file types are now stored in the directory by default (ftype=1) which greatly improves performance in certain scenarios with lots of (usually small) files. Do not fully trust random benchmarks online, especially outdated ones like above. While XFS is not the best choice for a filesystem that will ONLY store tons of very small files, it is by far the best filesystem choice all-around.
  • Jody Bruchon
    Jody Bruchon almost 6 years
    XFS v5 has been released since these comments were left. There have been significant improvements in performance. Also, most filesystems will generally have poor performance in the case of a file that is extended randomly over time due to fragmentation.
  • taffer
    taffer over 5 years
    @JodyLeeBruchon Who says that? XFS Performance is still a mixed bag: Recent benchmarks by Phoronix show that other filesystems are faster with small I/O. If you want to know what the best filesystem for your application is, you have to benchmark with your workload on your hardware and see what works best for you. Don't believe in silver bullets.
  • Jody Bruchon
    Jody Bruchon over 5 years
    @taffer Your "recent benchmark" is from April 2015, over three years old and uses XFS with only default options. This pre-dates xfsprogs 3.2.3 which makes XFS v5 the default and all the benefits it brings. It also wasn't formatted with -m finobt=1 which is a game-changer for XFS performance with small files and heavy metadata updates. No, there are no silver bullets, but basing your opinion on old benchmarks with is not wise, especially when major performance-changing features were ignored, unavailable, or disabled.
  • FGiorlando
    FGiorlando over 5 years
    the linked file dl.dropbox.com/u/40969346/stackoverflow/bench.py is a dead link. It's be great if you could post the benchmark code, thanks!
  • taffer
    taffer over 5 years
    @FGiorlando I updated the link
  • FGiorlando
    FGiorlando over 5 years
    thanks for the link update, very useful script!