find the single largest file

8,196

Solution 1

I don't know of any other way besides scanning the directory tree in question to collect the file sizes so that you can determine the largest file. If you know that there's a threshold of size you can instruct find to dismiss files that are below this threshold size.

$ find . -type f -size +50M ....

Would dismiss any files below the size of 50MB. If you know these files are always in a specific location you can target your find to this area instead of scanning the entire disk.

NOTE: This is a method that I typically employee since you shouldn't be getting random files in non /var types of directories, typically.

As to du you can tell it to output the sizes in human readable formats using the -h switch. The sort command knows how to sort these as well, again using its -h switch.

Example

$ find /home/saml/apps -type f -size +50M -print0 | \
    du -h --files0-from=- | sort -h | tail -1
1.4G    /home/saml/apps/MeVisLabSDK2.2.1_gcc-64.bin

The above find returns the list of files that are > 50MB using a null (\0) character as the separator. The du command takes this list and knows to split on nulls using the --files0-from=- switch. This output is then sorted by its human formatted sizes.

Without the tail -1:

$ find /home/saml/apps -type f -size +50M -print0 | \
    du -h --files0-from=- | sort -h
55M /home/saml/apps/MeVisLabSDK/Packages/MeVis/ThirdParty/lib/libQtXmlPatternsMLAB.so.4.6.2.debug
55M /home/saml/apps/MeVisLabSDK/Packages/MeVis/ThirdParty/Sources/Qt4/qt/lib/libQtXmlPatternsMLAB.so.4.6.2.debug
56M /home/saml/apps/MeVisLabSDK/Packages/FMEwork/ThirdParty/lib/libitkvnl-4.0_d.a
66M /home/saml/apps/MeVisLabSDK/Packages/FMEwork/Release/lib/libMLDcmtkAccessories_d.so
79M /home/saml/apps/MeVisLabSDK/Packages/FMEwork/Release/lib/libMLDcmtkMLConverters_d.so
94M /home/saml/apps/MeVisLabSDK/Packages/MeVis/ThirdParty/lib/libQtGuiMLAB.so.4.6.2.debug
94M /home/saml/apps/MeVisLabSDK/Packages/MeVis/ThirdParty/Sources/Qt4/qt/lib/libQtGuiMLAB.so.4.6.2.debug
112M    /home/saml/apps/ParaView-3.14.1-Linux-64bit.tar.gz
204M    /home/saml/apps/Slicer-4.1.1-linux-amd64.tar.gz
283M    /home/saml/apps/MeVisLabSDK/Packages/FMEwork/Release/lib/libMLDcmtkIODWrappers_d.so
1.4G    /home/saml/apps/MeVisLabSDK2.2.1_gcc-64.bin

Solution 2

You need to traverse the whole directory tree and check the size of each file in order to find the largest one.

In zsh, there's an easy way to sort files by size, thanks to the o glob qualifier:

print -rl -- **/*(D.oL)

To see just the largest files:

echo **/*(D.oL[-1])

To see the 10 largest files:

print -rl -- **/*(D.oL[-10,-1])

You can also use ls -S to sort the files by size. For example, this shows the top 10 largest files. In bash, you need to run shopt -s globstar first to enable recursive globbing with **; in ksh93, run set -o globstar first, and in zsh this works out of the box. This only works if there aren't so many files that the combined length of their names goes over the command line limit.

ls -Sd **/* | head -n 10

If there are lots of large files, collecting the information can take a very long time, and you should traverse the filesystem only once and save the output to a text file. Since you're interested in individual files, use the -S option of GNU du in addition to -a; this way, the display for directories doesn't include the size files in subdirectories, only files directly in that directory, which reduces the noise.

du -Sak >du
sort -k1n du | head -n 2

If you only want the size of files, you can use GNU find's -printf action.

find -type f -printf '%s\t%P\n' | sort -k1n >file-sizes.txt
tail file-sizes.txt

Note that if you have file names that contain newlines, this will mess up automated processing. Most GNU utilities have a way to use null bytes (which cannot appear in file names) instead, e.g. du -0, sort -z, \0 instead of \n, etc.

Share:
8,196

Related videos on Youtube

Avinash
Author by

Avinash

Updated on September 18, 2022

Comments

  • Avinash
    Avinash over 1 year

    We host a share of size 4 TB. How efficient is it to find a file with highest size.

    Usually we use:

    du -ak | sort -k1 -bn | tail -1
    

    and it is not easy to scan through a share of such huge size and then again sort them.

    Any suggestions to know only the single largest file in the share.

    And also du -ak is returning the size of current directory like (". 123455"). How do i avoid that?

  • erik
    erik almost 10 years
    +1 for the hint to du -h | sort -h or the SI prefixed variant preferred by me over the binary prefixed one: du --si | sort -h.
  • Stéphane Chazelas
    Stéphane Chazelas almost 10 years
    @erik, you don't want to use -h here as you lose precision. You could end up with 20 files with since 200M not knowing which one is the largest and thus returning the wrong result. You want to convert to "human readable" after you've sorted your list.
  • Stéphane Chazelas
    Stéphane Chazelas almost 10 years
    Note that %s gives the file size (and oL sorts on file size), while du gives the disk usage which are separate things. -k1n is the same as -n.
  • erik
    erik almost 10 years
    @StephaneChazelas, well, ok. And how could I achive that without writing a complex script? Is there a “convert-to-human-readable” command?
  • erik
    erik almost 10 years
    Ok, I’ve found it. But my distro (Fedora 17) is to old to have numfmt included, as I have a version lower than coreutils-8.21 which introduces this command line util.