Why is ls command slow to interrupt on NFS directory with lots of files?

11,319

Solution 1

When you run ls *, the first thing that happens is that the shell obtains a listing of the current directory. If the directory is huge and the server is slow, this could take a while.

Once the shell has obtained the list of names of files in the current directory, it sorts that list (which is very quick compared to any network interaction), then calls ls. The ls command looks up each file in turn and retrieves its metadata (stat call) to check whether it is a directory; if a file is a directory, ls lists its contents rather than the directory itself.

When you press Ctrl+C, this won't interrupt the current NFS operation. You can't interrupt a filesystem operation just anywhere, as this could leave the system in an inconsistent state. Even reading a file might update its access time. Most of the time, an elementary read or write operation on a file is instantaneous by human standards, but NFS is an exception, especially with large amounts of data over a slow network.

Thus pressing Ctrl+C will only take effect:

  • if the shell is currently generating the list of file names, when that list is fully retrieved;
  • if ls has been started and is currently obtaining metadata about a file, when the server supplies the metadata;
  • if ls has been started and is currently listing a directory, when the list is fully retrieved, or at least after a chunk of it.

It's not the fault of the ls command (which may not even have been started yet). It's not even the shell's fault: it's NFS's fault. NFS is pretty slow.

Solution 2

Hypothesis: traversing a directory over NFS is speculatively loading more data than you would expect at once. Way too much IO on the server side, causing a single NFS call to take >20s. mount with intr option might allow Ctrl-C to interrupt the in-flight call.

Google found a list of NFS calls which includes READDIRPLUS. Basically readdir + then stat for each file. Also getacl. In other words there's one NFS call that basically does ls -l (for a certain number of files). Instead of sending individual stat requests for each file.

Googling for nfs readdirplus - this looks quite plausible. First result is a (paywalled) bug report.

RHEL6: NFSv3 READDIRPLUS drastically slows down globbing over a NFS directory leading to performance problems

  • RHEL 6.2 with a standard mount, a glob in a directory containing over 3000 directories takes 218 seconds (nearly four minutes).
  • A mount of the same directory using the nordirplus option to disable the use of readdirplus calls, a glob on the same directory takes only 1.7 seconds

so that seems an interesting line of investigation.


EDIT: note that your example ls * is also a use of globbing. ls . would avoid the glob. If that makes a difference to your specific question, it could be because the globbing takes place in the shell, instead of ls, which might affect how the Ctrl-C is processed. I doubt it matters, but it just struck me as an interesting question.

Share:
11,319

Related videos on Youtube

steve
Author by

steve

Updated on September 18, 2022

Comments

  • steve
    steve almost 2 years

    I noticed in directories with lots of files on NFS if I do

    ls *
    

    then hit Ctrl-C it can take a while ( say 20 seconds ) before ls comes back.

    Why? Can ls command be fixed to be more responsive?

    • Admin
      Admin almost 10 years
      ls * is going to cause ls to look at every file to see whether it's a directory. That can take some time over NFS. If you just want to list the contents of the current directory, type ls. (No *).
  • user4476006
    user4476006 almost 7 years
    How is prepending ls with a backslash going to bypass the alias? Using /bin/ls would work, though.
  • user4476006
    user4476006 almost 7 years
    Live and learn. Thanks for clearing that up.
  • theferrit32
    theferrit32 almost 5 years
    This fixed it. ls was aliased to ls --color=auto, and it hung for like 30 minutes until I cancelled it. After disabling the alias it executed in 5 seconds. Seems adding colors adds a lot of overhead.