Bad NTFS performance

10,038

Solution 1

NTFS has this thing called a Master File Table. It sounds really cool when you read about it.

You can see that ext3 performs alright up to about 95% disk use, while the existence of the MFT means that NTFS doesn't really want you to use more than 90% of your disk. But I'll assume that's not your problem, and that your problem is with the many operations on many small files.

One of the differences here is what happens when you create a small file. If a file is smaller than a block size, it is not written to it's own block but rather is stored in the MFT. This is nice if the file stays exactly the way it was when created. In practice though, it means that when svn touches a file to create it, then adds to that file, removes from it, or just modifies it by not enough to move it to it's own block, the operation is pretty slow. Also just reading lots of small files puts some stress on the MFT where they all reside, with multiples per block. Why would it do this? It's preemptively avoiding fragmentation and using more of the blocks more effectively, and in general that's a good thing.

In ext2 and 3 by contrast, file blocks for every file are stored next to where the directory metadata is for the directory they're in (when possible, if your disk is unfragmented and you have about 20% space free). This means that as svn is opening up directories, a number of blocks get cached basically for free in that 16mb cache on your drive, and then again in the kernel's cache. Those files might include the .svn file and the revision files for your last update. This is handy since those are likely some of the files svn is looking at next. NTFS doesn't get to do this, though large parts of the MFT should be cached in the system, they might not be the parts you will want next.

Solution 2

Well, your particular problem is because

  1. Subversion itself comes from the UNIX world, the Windows version therefore assumes similar performance characteristics.
  2. NTFS performance really isn't great with gazillions of small files.

What you are seeing is simply an artifact of something designed for a particular operating system with performance assumptions on that operating systems. This usually breaks down badly, when taken to other systems. Other examples would be forking vs. threading. On UNIX-likes the traditional way of parallizing something is just to spawn another process. On Windows, where processes take at least five times longer to start, this is a really bad idea.

In general, you can't just take any artifacts of a particular OS to be granted on any other one with vastly different architecture. Also don't forget that NTFS has many file system features that were absent in UNIX file systems widely in use at that point, such as journaling and ACLs. Those things come at a cost.


Some day, when I have lots of free time, I was planning to write a SVN filesystem module which takes advantage of features you have on NTFS, such as transaction support (should eliminate the "touching millions of small files issue") and alternate data streams (should eliminate the need of the separate .svn directory). It'd be a nice thing to have but I doubt the SVN devs will get around implementing such things in the foreseeable future.

Side note: A single update on a large SVN repository I am using took around 250,000 file operations. Some tiny voice tells me that this is really much for 24 files that changed ...

Solution 3

Here's Microsoft's info on how NTFS works. It may be overkill for what you're looking for but studying it may shed some light on what scenarios NTFS has problems with.

Share:
10,038

Related videos on Youtube

JesperE
Author by

JesperE

Programming language omnivore: Java, Ruby, Python, C, C++, Perl, Erlang, Shell-script, CMake. Has worked a lot with different build systems, GUI construction tools, release engineering, portable programming. I try to live by the Ten Commandments of Ego-less Programming.

Updated on September 17, 2022

Comments

  • JesperE
    JesperE almost 2 years

    Why is it that NTFS performance is so lousy compared to, for example, Linux/ext3? Most often I see this when checking out (large) source trees from Subversion. Checkout takes around 10-15 minutes on NTFS, while corresponding checkout on Linux (on almost identical hardware) takes an order of magnitude faster (1 - 1.5 minutes).

    Maybe this is specific to handling lot of small files and NTFS is better when it comes to large files, but why should that be? Wouldn't improving NTFS performance for small files be hugely beneficial for Windows performance in general?

    EDIT: This is not meant as a "NTFS sucks compared to ext3" inflammatory question; I'm genuinely interested in why NTFS performs bad in certain cases. Is it just bad design (which I doubt), or are there other issues which come into play?

    • ChrisInEdmonton
      ChrisInEdmonton almost 15 years
      Perhaps this could be reworded so that you are asking how to improve the performance of NTFS when dealing with lots of small files, rather than asking why NTFS sucks compared to ext3?
    • Sasha Chedygov
      Sasha Chedygov almost 15 years
      Agree with @Chris, this question is kind of pointless as-is.
    • JesperE
      JesperE almost 15 years
      Well, I'm genuinely interested in why NTFS is performing badly. If the the answer is "do X to make it faster", then great, but I'd settle for understanding the problem.
    • Sasha Chedygov
      Sasha Chedygov almost 15 years
      Ah, okay, sorry for misunderstanding you.
    • dlamblin
      dlamblin almost 15 years
      BTW when you were using SVN on a Windows machine, did that machine have a virus scanner with real-time protection enabled? That could be bad.
    • JesperE
      JesperE almost 15 years
      I've benchmarked both ways, and while the virus scanner did have a measurable impact, it didn't explain the bad performance of NTFS.
    • phuclv
      phuclv almost 11 years
      in some cases maybe exFAT is better because it doesn't have many NTFS's bulky features such as permission, compression, etc.
  • JesperE
    JesperE almost 15 years
    But why is NTFS performance bad when dealing with gazillion of small files? Did that have to be sacrificed in order to get something else?
  • ChrisInEdmonton
    ChrisInEdmonton almost 15 years
    You are correct that this is where small files live, but I'm not sure why this should put stress on the MFT. Wouldn't it make it far easier to read these files, as you are all but guaranteed to pull lots of these files into cache when you pull any of them?
  • dlamblin
    dlamblin almost 15 years
    @ChrisInEdmonton It's the updates to the MFT that stress it, because you're not touching blocks where neighboring space is available, you end up moving things around and also invalidating the cached parts of the MFT. I'll grant you that on paper the MFT should be an very fast way of handling small files. It just doesn't bear out in practice.
  • DanielSmedegaardBuus
    DanielSmedegaardBuus over 4 years
    What is any of this based on?