Why is rm slow on an external storage drive (USB-connected, type fuseblk) with 50Gb of files?

18,430

Solution 1

Ultimately, no matter what you do, rm has to run unlink on every single file that you want to remove (even if you call rm -r on the parent directory). If there are a lot of files to remove, this can take a long time.

There are two particularly time consuming processes when you run rm -r:

  1. readdir, followed by,
  2. a number of calls to unlink.

Finding all the files, and then going through every single file to remove it, can take a really, really long time.

If you find this "unusable" because it renders the directory unusable for some time, consider moving the parent directory before removing it. This will free up that name for the program to use again, without the time being too much of an inconvenience.

Assuming that the file system really is NTFS (it's unclear from your question), NTFS is generally quite slow at deleting large swathes of files. You might consider using a more suitable filesystem for your purposes (the more recent ext filesystems have pretty good delete performance, if you don't have any other particular needs). FUSE itself is also not particularly fast, in general. You might consider seeing if you can do this in some way that does not use FUSE.

Solution 2

Why is rm so slow? I have no idea. But I do know a faster way:

mkdir blank
rsync -a --delete blank/ test/

Update: This answer on Serverfault has some explanations. It looks like rsync is deleting the files in a particular order that causes the filesystem tree to remain balanced, and not ever need rebalancing. rm will just delete the files and cause a lot of rebalancing as they are removed. There is some information about rebalancing here.

Solution 3

Well, I once had a similar problem with yours. I found that your "wa" is high, you could use

iostat -x 1

to check whether your disk util is high, if so, it means that your disk is quite busy. Check that whether some other processes are writing to disk continuously.

For simpility, use

vmstat 1

to check whether b is high or r < b. That indicates something wrong. In your situation, I think the disk io is original reason.

Share:
18,430

Related videos on Youtube

Benubird
Author by

Benubird

Updated on September 18, 2022

Comments

  • Benubird
    Benubird almost 2 years

    I have been trying to use rsnapshot for making backups, but I'm finding it unusable. While it is able to diff a directory (50gb) and duplicate it (hardlinking every file) in a few minutes, and I can cp the whole directory in about half an hour, it takes well over an hour to delete it. Even directly using rm -rfv, I find it can take up to half a second to rm a single file, whereas the cp and link commands complete instantly.

    Why is rm so slow? Is there any faster way to recursively remove hardlinks? It doesn't make sense to me that copying a file should take less time than removing it.

    The filesystem I am working on is an external storage drive, connected via usb and type fuseblk (which I think means it's ntfs). My computer is running ubuntu linux.

    Output from top:

    Cpu(s):  3.0%us,  1.5%sy,  0.0%ni, 54.8%id, 40.6%wa,  0.0%hi,  0.1%si,  0.0%st
    Mem:   8063700k total,  3602416k used,  4461284k free,   557604k buffers
    
    • clerksx
      clerksx over 10 years
      Being mounted as fuseblk doesn't mean the drive is NTFS, it just means that it is mounted as a FUSE block device. That could be almost anything.
    • Benubird
      Benubird over 10 years
      @ChrisDown True, but I know it's either NTFS or ext3, and I'm pretty sure if it was ext3 it would be mounted as such by mount with no arguments.
    • smci
      smci over 6 years
      It depends how many files are in the directory (you didn't say how many), and in particular NTFS slows down with only >3K files in directory. Pretty much every other filesystem is much more performant. See all the many other posts on SO/SE about effect of number of files on filesystem performance.
  • peterph
    peterph over 10 years
    +1 Really a lot depends on the exact file system - many tend to perform really well for some operations while being sluggish with others (often this is for file creation vs. removal vs. data access).
  • MattBianco
    MattBianco about 10 years
    Have you benchmarked this and compared to rm -rf? rsync still has to unlink() all the files in test/, and that's probably what takes the time.
  • rjmunro
    rjmunro about 10 years
    I haven't formally benchmarked it, but I did try it after reading someone else's benchmarks, and the difference was substantial. I can't find that post any more, but this answer on serverfault has an explanation and source for an even faster delete program.
  • MattBianco
    MattBianco about 10 years
    But the fastest method must be unlink(2) on the directory (and remembering to do an fsck later)...
  • Dominik George
    Dominik George over 8 years
    A fact's a fact. Just timed it, and it is almost twice as fast. After reading GNU coreutils rm code, it doesn't even make me wonder…
  • telcoM
    telcoM about 3 years
    I agree that flush is probably the reason of slowness. But it is there to minimize damage in case the user just unplugs the USB stick while the system is running without properly unmounting it first. You should at least warn that if the disk is mounted the way you suggest, proper unmounting before unplugging will be absolutely required.