Find directories with lots of files in

29,662

Solution 1

Check /lost+found in case there was a disk problem and a lot of junk ended up being detected as separate files, possibly wrongly.

Check iostat to see if some application is still producing files like crazy.

find / -xdev -type d -size +100k will tell you if there's a directory that uses more than 100kB of disk space. That would be a directory that contains a lot of files, or contained a lot of files in the past. You may want to adjust the size figure.

I don't think there's a combination of options to GNU du to make it count 1 per directory entry. You can do this by producing the list of files with find and doing a little bit of counting in awk. Here is a du for inodes. Minimally tested, doesn't try to cope with file names containing newlines.

#!/bin/sh
find "$@" -xdev -depth | awk '{
    depth = $0; gsub(/[^\/]/, "", depth); depth = length(depth);
    if (depth < previous_depth) {
       # A non-empty directory: its predecessor was one of its files
       total[depth] += total[previous_depth];
       print total[previous_depth] + 1, $0;
       total[previous_depth] = 0;
    }
    ++total[depth];
    previous_depth = depth;
}
END { print total[0], "total"; }'

Usage: du-inodes /. Prints a list of non-empty directories with the total count of entries in them and their subdirectories recursively. Redirect the output to a file and review it at your leisure. sort -k1nr <root.du-inodes | head will tell you the biggest offenders.

Solution 2

You can check with this script:

#!/bin/bash

if [ $# -ne 1 ];then
  echo "Usage: `basename $0` DIRECTORY"
  exit 1
fi

echo "Wait a moment if you want a good top of the bushy folders..."

find "$@" -type d -print0 2>/dev/null | while IFS= read -r -d '' file; do 
    echo -e `ls -A "$file" 2>/dev/null | wc -l` "files in:\t $file"
done | sort -nr | head | awk '{print NR".", "\t", $0}'

exit 0

This prints the top 10 subdirectories by file count. If you want a top x, change head with head -n x, where x is a natural number bigger than 0.

For 100% sure results, run this script with root privileges:

top-bushy-folders

Solution 3

Often faster than find, if your locate database is up to date:

# locate '' | sed 's|/[^/]*$|/|g' | sort | uniq -c | sort -n | tee filesperdirectory.txt | tail

This dumps the entire locate database, strips off everything past the last '/' in the path, then the sort and "uniq -c" get you the number of files/directories per directory. "sort -n" piped to tail to get you the ten directories with the most things in them.

Solution 4

a bit old thread but interesting so I suggest my solutions.

First uses few piped commands and it finds directories with over 1000 files inside:

find / -type d  |awk '{print "echo -n "$0" ---- ; ls -1 "$0" |wc -l "}'|bash |awk -F "----" '{if ($2>1000) print $1}'

Second is simple. It just try to find directories that have size over 4096B. Normally empty directory has 4096B on the ext4 filesystem i 6B on the xfs:

find / -type d -size +4096c

You can adjust it of course but I believe that it should work in most cases with such value.

Share:
29,662

Related videos on Youtube

Oli
Author by

Oli

Hi, I'm Oli and I'm a "full-stack" web-dev-op. Eurgh. I'm also allergic to jargon BS. I spend most of my professional time writing Django websites and webapps for SMEs. I write a lot of Python outside of Django sites too. I administer various Linux servers for various tasks. I contribute to the open source projects that I use when I can. I'm a full-time Linux user and that has lead to helping other people live the dream. I am an official Ubuntu Member and I earnt my ♦ on SE's own Ask Ubuntu in 2011's moderator election. That's probably where I spend most of my unpaid time. I also run thepcspy.com which has been my place to write for the last decade or so. If you need to contact me for extended help, you can do so via my website, just remember that I have bills so if I feel your request is above and beyond normal duty, I might ask for remuneration for one-on-one support. For more social contact, you can usually find me (or just my computer) lurking in the Ask Ubuntu General Chat Room and on Freenode in #ubuntu and #ubuntu-uk under the handle Oli or Oli``.

Updated on September 18, 2022

Comments

  • Oli
    Oli over 1 year

    So a client of mine got an email from Linode today saying their server was causing Linode's backup service to blow up. Why? Too many files. I laughed and then ran:

    # df -ih
    Filesystem     Inodes IUsed IFree IUse% Mounted on
    /dev/xvda        2.5M  2.4M   91K   97% /
    

    Crap. 2.4million inodes in use. What the hell has been going on?!

    I've looked for the obvious suspects (/var/{log,cache} and the directory where all the sites are hosted from) but I'm not finding anything really suspicious. Somewhere on this beast I'm certain there's a directory that contains a couple of million files.

    For context one my my busy servers uses 200k inodes and my desktop (an old install with over 4TB of used storage) is only just over a million. There is a problem.

    So my question is, how do I find where the problem is? Is there a du for inodes?

  • Radu Rădeanu
    Radu Rădeanu almost 11 years
    The script give errors: awk: line 2: find: regular expression compile failed (bad class -- [], [^] or [) [^ awk: line 2: syntax error at or near ] `/tmp/tmpw99dhs': Permission denied
  • h3.
    h3. almost 11 years
    @RaduRădeanu Ah, I see, I used a gawk peculiarity that doesn't work in other versions. I've added a backslash which I think is necessary as per POSIX.
  • Max Beikirch
    Max Beikirch about 9 years
    +1: using the locate database is a very nice idea!
  • Sandeep
    Sandeep over 5 years
    When you can't use locate for whatever reason, run a find /path/to/parent -xdev > filelist first, then direct sed to read input from that list.
  • Dipin
    Dipin almost 5 years
    2019: raised 10: read: Illegal option -d ... scrubbed the -d flag from read hoping nothing bad will happen. Will let you know when it finishes running ...
  • Guasqueño
    Guasqueño over 2 years
    If the file system is close to 100% full you don't want to create another temp file that makes the situation worst. That's wy I like Gille's answer better as it does not create a new file.