How to determine how many files are within a directory without counting?

6,091

Solution 1

The size of the directory (as seen with ls -ld /var/lib/php/sessions) can give an indication. If it's small, there aren't many files. If it's large, there may be many entries in there, or there may have been many in the past.

Listing the content, as long as you don't stat individual files, shouldn't take a lot much longer than reading a file the same size.

What might happen is that you have an alias for ls that does ls -F or ls --color. Those options cause an lstat system call to be performed on every file to see for instance if they are a file or directory.

You'll also want to make sure that you list dot files and that you leave the file list unsorted. For that, run:

command ls -f /var/lib/php/sessions | wc -l

Provided not too many filenames have newline characters, that should give you a good estimate.

$ ls -lhd 1
drwxr-xr-x 2 chazelas chazelas 69M Aug 15 20:02 1/
$ time ls -f 1 | wc -l
3218992
ls -f 1  0.68s user 1.20s system 99% cpu 1.881 total
wc -l  0.00s user 0.18s system 9% cpu 1.880 total
$ time ls -F 1 | wc -l
<still running...>

You can also deduce the number of files there by subtracting the number of unique files elsewhere in the file system from the number of used inodes in the output of df -i.

For instance, if the file system is mounted on /var, with GNU find:

find /var -xdev -path /var/lib/php/sessions -prune -o \
  -printf '%i\n' | sort -u | wc -l

To find the number of files not in /var/lib/php/sessions. If you subtract that to the IUsed field in the output of df -i /var, you'll get an approximation (because some special inodes are not linked to any directory in a typical ext file system) of the number of files linked to /var/lib/php/sessions that are not otherwise linked anywhere else (note that /var/lib/php/sessions could very well contain one billion entries for the same file (well actually the maximum number of links on a file is going to be much lower than that on most filesystems), so that method is not fool-proof).

Note that if reading the directory content should be relatively fast, removing files can be painfully slow.

rm -r, when removing files, first lists the directory content, and then calls unlink() for every file. And for every file, the system has to lookup the file in that huge directory, which if it's not hashed can be very expensive.

Solution 2

I guess you could use ls -1 /var/lib/php/sessions | wc -l over SSH

Note that it is ls -(one) and ws -(letter l)

Solution 3

In ext2/3/4 you can make some approximation based on the size of the directory itself. This is not very accurate, but allows an educated guess without "counting".

A directory starts at 4096 bytes by default. As it gets more entries, it grows in size (but will never shrink again). After some testing it appears that each entry takes 12 bytes, and grows by 4096 when it reaches the limit. This means that if the current directory size is 16384, you have had between 1,024 and 1,365 entries at one time.

There is also a trick to viewing files in these directories immediately. When you run ls, the entire directory is read and then sorted. You can disable the sorting using ls -1U.

Share:
6,091

Related videos on Youtube

Pangamma
Author by

Pangamma

I wish I had more rep so I could upload images to supplement my posts for future viewers.

Updated on September 18, 2022

Comments

  • Pangamma
    Pangamma over 1 year

    I've been having a fairly serious issue on a high traffic web server. PHP pages are slowing down considerably, and it only seems to be an issue on pages where sessions are accessed, or a certain table within a database is being referenced. the '/var/log/messages' log file, I see hundreds of thousands of the following error: 'kernel: EXT4-fs warning (device dm-0): ext4_dx_add_entry: Directory index full!'

    I suspect there is a bottleneck in '/var/lib/php/sessions' because I cannot open the folder in Filezilla, and cannot count the number of files/sub-directories with grep. While it is quite possibly a case of hard drive corruption, I'd like to verify a hunch of mine first by checking the number of files inside of this directory.

    How would you go about finding the number of files within a folder without actually counting the files in said folder?

    • Admin
      Admin over 10 years
      In PHP or in general?
    • Admin
      Admin over 10 years
      In general. Assume you have ssh access to the machine.
    • Gilles 'SO- stop being evil'
      Gilles 'SO- stop being evil' over 10 years
      What does df -i say?
  • Pangamma
    Pangamma over 10 years
    Tried it. It has been going for about a minute now. No results yet. Any ideas?
  • Adi Menon
    Adi Menon over 10 years
    It shouldn't take more time that a simple ls... Can you do ls /var/lib/php/sessions ?
  • jordanm
    jordanm over 10 years
    @mash I have seen directories with hundreds of thousands of files take 10 minutes to ls.
  • Pangamma
    Pangamma over 10 years
    No, I cannot. Also, according to htop, the command is slowly taking up more and more of the system's RAM. Not sure what to do about the session directory. I am beginning to wonder if they are supposed to be stored in hashed directories on large servers. Looking into it now while the commands you suggested are active in the background.
  • Adi Menon
    Adi Menon over 10 years
    @jordanm I never experienced that with a simple ls, only with recursive commands like ls -laR; good to know :)
  • Adi Menon
    Adi Menon over 10 years
    @Pangamma I think it's normal that it's taking more RAM as it run, since it's a piped command. It keeps the result in memory before doing the second part of the command (wc -l).
  • manatwork
    manatwork over 10 years
    Try ls -1U to disable sorting.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    The portable and canonical way to do that is with ls -f
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    Well a filename that is 200 characters long will typically use more that 12 bytes.
  • jordanm
    jordanm over 10 years
    @StephaneChazelas good note, it is a pretty inaccurate way to calculate.
  • Tulains Córdova
    Tulains Córdova over 10 years
    What does ls -1 do ?
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    @user1598390, ls -1 is to list the files one one column. It is only necessary when the output goes to a terminal. When not, like when it's piped to wc, it will be on one column anyway.