Why is rm slow on an external storage drive (USB-connected, type fuseblk) with 50Gb of files?
Solution 1
Ultimately, no matter what you do, rm
has to run unlink
on every single file that you want to remove (even if you call rm -r
on the parent directory). If there are a lot of files to remove, this can take a long time.
There are two particularly time consuming processes when you run rm -r
:
-
readdir
, followed by, - a number of calls to
unlink
.
Finding all the files, and then going through every single file to remove it, can take a really, really long time.
If you find this "unusable" because it renders the directory unusable for some time, consider moving the parent directory before removing it. This will free up that name for the program to use again, without the time being too much of an inconvenience.
Assuming that the file system really is NTFS (it's unclear from your question), NTFS is generally quite slow at deleting large swathes of files. You might consider using a more suitable filesystem for your purposes (the more recent ext filesystems have pretty good delete performance, if you don't have any other particular needs). FUSE itself is also not particularly fast, in general. You might consider seeing if you can do this in some way that does not use FUSE.
Solution 2
Why is rm so slow? I have no idea. But I do know a faster way:
mkdir blank
rsync -a --delete blank/ test/
Update: This answer on Serverfault has some explanations. It looks like rsync is deleting the files in a particular order that causes the filesystem tree to remain balanced, and not ever need rebalancing. rm will just delete the files and cause a lot of rebalancing as they are removed. There is some information about rebalancing here.
Solution 3
Well, I once had a similar problem with yours. I found that your "wa" is high, you could use
iostat -x 1
to check whether your disk util is high, if so, it means that your disk is quite busy. Check that whether some other processes are writing to disk continuously.
For simpility, use
vmstat 1
to check whether b is high or r < b. That indicates something wrong. In your situation, I think the disk io is original reason.
Related videos on Youtube
Benubird
Updated on September 18, 2022Comments
-
Benubird almost 2 years
I have been trying to use rsnapshot for making backups, but I'm finding it unusable. While it is able to diff a directory (50gb) and duplicate it (hardlinking every file) in a few minutes, and I can cp the whole directory in about half an hour, it takes well over an hour to delete it. Even directly using
rm -rfv
, I find it can take up to half a second to rm a single file, whereas thecp
andlink
commands complete instantly.Why is rm so slow? Is there any faster way to recursively remove hardlinks? It doesn't make sense to me that copying a file should take less time than removing it.
The filesystem I am working on is an external storage drive, connected via usb and type fuseblk (which I think means it's ntfs). My computer is running ubuntu linux.
Output from top:
Cpu(s): 3.0%us, 1.5%sy, 0.0%ni, 54.8%id, 40.6%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8063700k total, 3602416k used, 4461284k free, 557604k buffers
-
clerksx over 10 yearsBeing mounted as
fuseblk
doesn't mean the drive is NTFS, it just means that it is mounted as a FUSE block device. That could be almost anything. -
Benubird over 10 years@ChrisDown True, but I know it's either NTFS or ext3, and I'm pretty sure if it was ext3 it would be mounted as such by mount with no arguments.
-
smci over 6 yearsIt depends how many files are in the directory (you didn't say how many), and in particular NTFS slows down with only >3K files in directory. Pretty much every other filesystem is much more performant. See all the many other posts on SO/SE about effect of number of files on filesystem performance.
-
-
peterph over 10 years+1 Really a lot depends on the exact file system - many tend to perform really well for some operations while being sluggish with others (often this is for file creation vs. removal vs. data access).
-
MattBianco about 10 yearsHave you benchmarked this and compared to
rm -rf
?rsync
still has tounlink()
all the files intest/
, and that's probably what takes the time. -
rjmunro about 10 yearsI haven't formally benchmarked it, but I did try it after reading someone else's benchmarks, and the difference was substantial. I can't find that post any more, but this answer on serverfault has an explanation and source for an even faster delete program.
-
MattBianco about 10 yearsBut the fastest method must be
unlink(2)
on the directory (and remembering to do anfsck
later)... -
Dominik George over 8 yearsA fact's a fact. Just timed it, and it is almost twice as fast. After reading GNU coreutils rm code, it doesn't even make me wonder…
-
telcoM about 3 yearsI agree that
flush
is probably the reason of slowness. But it is there to minimize damage in case the user just unplugs the USB stick while the system is running without properly unmounting it first. You should at least warn that if the disk is mounted the way you suggest, proper unmounting before unplugging will be absolutely required.