Determine Location of Inode Usage

linux disk-space-utilization inode

26,961

Solution 1

Don't expect this to run quickly...

cd to a directory where you suspect there might be a subdirectory with lots of inodes. If this script takes a huge amount of time, you've likely found where in the filesystem to look. /var is a good start...

Otherwise, if you change to the top directory in that filesystem and run this and wait for it to finish, you'll find the directory with all the inodes.

find . -type d | 
while 
  read line  
do 
  echo "$( find "$line" -maxdepth 1 | wc -l) $line"  
done | 
sort -rn | less

I'm not worried about the cost of sorting. I ran a test and sorting through the unsorted output of that against 350,000 directories took 8 seconds. The initial find took . The real cost is opening all these directories in the while loop. (the loop itself takes 22 seconds). (The test data was run on a subdirectory with 350,000 directories, one of which had a million files, the rest had between 1 and 15 directories).

Various people had pointed out that ls is not great at that because it sorts the output. I had tried echo, but that is also not great. Someone else had pointed out that stat gives this info (number of directory entries) but that it isn't portable. It turns out that find -maxdepth is really fast at opening directories and counts .files, so... here it is.. points for everyone!

Solution 2

If the issue is one directory with too many files, here is a simple solution:

# Let's find which partition is out of inodes:
$ df -hi
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda3               2.4M    2.4M       0  100% /
...

# Okay, now we know the mount point with no free inodes,
# let's find a directory with too many files:
$ find / -xdev -size +100k -type d

The idea behind the find line is that the size of a directory is proportional to the amount of files directly inside that directory. So, here we look for directories with tons of files inside it.

If you don't want to guess a number, and prefer to list all suspect directories ordered by "size", that's easy too:

# Remove the "sort" command if you want incremental output
find / -xdev -size +10k -type d -printf '%s %p\n' | sort -n

Solution 3

Grrr, commenting requires 50 rep. So this answer is actually a comment on chris's answer.

Since the questioner probably doesn't care about all the directories, only the worst ones, then using sort is likely very expensive overkill.

find . -type d | 
while 
  read line  
do 
  echo "$(ls "$line" | wc -l) $line"  
done | 
perl -a -ne'next unless $F[0]>=$max; print; $max=$F[0]'  | less

This isn't as complete as your version, but what this does is print lines if they're larger than the previous maximum, greatly reducing the amount of noise printed out, and saving the expense of the sort.

The downside of this is if you have 2 very large directories, and the first happens to have 1 more inode than the 2nd, you'll never see the 2nd.

A more complete solution would be to write a smarter perl script that keeps track of the top 10 values seen, and prints those out at the end. But that's too long for a quick serverfault answer.

Also, some midly smarter perl scripting would let you skip the while loop - on most platforms, ls sorts the results, and that can also be very expensive for large directories. The ls sort is not necessary here, since all we care about is the count.

Solution 4

You can use this little snippet:

find | cut -d/ -f2 | uniq -c | sort -n

It will print out how many files and directories are in each of the directories in the current folder, with the largest offenders at the bottom. It will help you find directories that have lots of files. (more info)

Solution 5

I'd brute force this one: run tripwire on the entire device for a baseline, then run a check some time later and the offending directory will stick out like a sore thumb.

View more solutions

26,961

Dave Forgac

Updated on September 17, 2022

Comments

Dave Forgac over 1 year

I recently installed Munin on a development web server to keep track of system usage. I've noticted that the system's inode usage is climbing by about 7-8% per day even though the disk usage has barely increased at all. I'm guessing something is writing a ton of tiny files but I can't find what / where.

I know how to find disk space usage but I can't seem to find a way to summarize inode usage.

Is there a good way to determine inode usage by directory so I can locate the source of the usage?
chris almost 15 years

True about the ls -- in situations like this I worry more about it being clear what I'm doing and not so much about the performance. I'm pretty sure that you can use echo $line/* | wc -w in place of ls $line | wc -l and you avoid the ls sorting issue.
chris almost 15 years

I just ran a test on a directory with a million files and ls took 22 seconds and echo * took 12 seconds. (For the record, echo * in shell won't hit the arg limit because echo in 99% of shells in active use is a built-in)
chris almost 15 years

@mike G: You're 100% correct about this not being the fastest way to do this sort of thing. In my mind, the correct way to optimize this is to redirect to stderr when starting and finishing the "count directory entries" part of the script. That way, when you hit a directory with a million entries it will say "processing directory spool/postfix/maildrop" and then not say instantly "finished" and boom -- look in spool/postfix/maildrop and you'll see lots of files.
egorgry almost 15 years

lol. I voted you up one. Thanks for the explanation. inode usage has always been confusing.
Mike G. almost 15 years

thanks Now I'm afraid to convert this into a comment on your node, in case I lose the karma when I delete this answer :)
egorgry almost 15 years

ls -i prints the inode NUMBER for an entry, not the inode COUNT. Try it with a file in your directory - you'll (probably) see an equally high number, but it's not the count of inodes, it's just the inode # your directory entry points to.
chris almost 15 years

The inodes could also be taken by really deep directories, which this won't find. There are a number of weird edge cases in this, but the most common situation is a directory full of files with normal names.
chris almost 15 years

That would likely take a billion years. A quicker thing to do is to run lsof | grep DIR and look in each of those directories for lots of new files.
Geoff Fritz almost 15 years

Ok, how about this: find / | sort > /tmp/find1.txt ; find / | sort > /tmp/find2.txt ; diff /tmp/find1.txt /tmp/find2.txt
Dave Forgac almost 15 years

I also wasn't worried about the cost of sorting as this is a one-time or at least fairly infrequent task.
mpez0 about 14 years

ls -f won't sort the results. Sorting of directory results leads to a common problem with NFS and large directories. If the time to read and sort the directory (on the server) exceeds the NFS timeout, the directory and subdirectories are unusable.
ptman over 11 years

This worked brilliantly.