Why is disk usage greater than the size of all files on it?

18,301

Solution 1

Files on disk have two sizes: the "apparent size" and the "size on disk". Several reasons can cause a large discrepancy:

  • A large number of files will result in a large amount of overhead, because of internal fragmentation. E.g. Ext4 has a 4KiB default block size; files smaller than that will consume always 4KiB, and sizes above will be "rounded" to this block alignment.
  • Directories are also files and the same rule applies for them as well. Moreover, if you would create a large amount of files in a directory, and remove them again later, the usage on the directory file can't be reclaimed (recreating the directory helps).
  • Sparse files are special files, that appear to be large, but aren't 'consuming' the space. This is common in virtualization for virtual disk images; they will appear large, but the 'real' size can be a lot smaller. A lot of utilities (and file managers) are incapable of showing the actual disk usage.
  • The use of hard links. The contents of a file can exist on disk while multiple references are pointing to it. Some file managers may account the size for every reference.

I would suggest to use a disk usage tool known to be capable of listing both sizes to see if this is the issue. Try ncdu in a terminal and use a to toggle between actual and disk usage.


A short demo on internal fragmentation due to a 4KiB block size filesystem using du:

$ sudo tune2fs -l /dev/path-to-device | grep "Block size"
Block size:               4096

$ echo blaataaap > myfile                      # creates a 10-byte file

$ du --block-size=1 myfile                     # prints the usage on disk (filesystem)
4096   myfile

$ du --apparent-size --block-size=1 myfile     # prints the apparent size, i.e.
10     myfile                                  # content length when seeking

$ ls -al
-rw-rw-r-- 1 gert gert 10 Jan 1 23:24 myfile   # ls uses apparent sizes

This means that this 10-byte file is 4086 bytes bigger on disk than it would appear in a listing and is suffering from internal fragmentation.


A short demo on hard links and disk usage shown wrong when listing files (ls in this case):

$ dd if=/dev/zero of=1MBfile bs=1M count=1 # create a 1MB file
$ ln 1MBfile a_hard_link                   # create a hard link to it

$ ls -alht                                 # ls will report 2MB
total 2.1M
drwxrwxr-x  2 gert gert 4.0K Jan  2 11:21 .
-rw-rw-r--  2 gert gert 1.0M Jan  2 11:21 1MBfile
-rw-rw-r--  2 gert gert 1.0M Jan  2 11:21 a_hard_link

$ du -B 1024 .                             # du reports 1028K total for directory
1028    .

$ du -B 1024 a_hard_link                   # and 1024K for each file individually
1024    a_hard_link
$ du -B 1024 1MBfile
1024    1MBfile

Solution 2

This happens because the total disk usage is never equal to the sum of all files there (which is what Nautilius shows once you select all files).

The reason for this is that the file systems themselves tend to occupy some space on the partition. Most likely, if you wiped out all data you store on that HDD, disk usage would be about 150GB. That space is reserved for the file system - it is required, as the file system need to store the data about files somewhere. ext4 pre-allocates this overhead space before any files are created, as opposed to - for example - ext3, where that space grows as more files are added to the partition.

If you consider these 150GB as a problem, please notice that it's just 5% of your total HDD size. If you would need more than 95% on you hard drive, you probably need to buy a larger one, instead of worrying about these 150GB that are out of your reach.

Also, please keep in your mind that in case of ext4 this space is not wasted. Data fragmentation is not a problem here, but the cost of this advantage is that extra occupied space. There are ways to decrease it and to force ext4 to use much less of this space, but that's not reccomended, as - because of fragmentation chances and other optimisations that won't be able to happen - this will very likely result in your machine working much slower, as data access won't be this smooth.

Share:
18,301

Related videos on Youtube

oshirowanen
Author by

oshirowanen

Updated on September 18, 2022

Comments

  • oshirowanen
    oshirowanen over 1 year

    I have a 3TB HDD. In the properties screen of the HDD, it says that I have used 471.4GB, but when I select all the files in nautilus, it says that 321.0GB is selected. If I only have 321.0GB of files in the HDD, why is it using 471.4GB?

    The HDD's partitioning is using GUID and file system being used in EXT4. When I select the HDD using Disk Utility app, I get a warning saying:

    WARNING: The partition is misaligned by 3072 bytes.
    This may result in very poor performance.  Repartitioning is suggested.
    

    Has that got anything to do with the missing 150.4GB?

    • Flimm
      Flimm over 11 years
      You're using the unit Tb, when you almost certainly mean TB, or perhaps TiB. See this question for the difference between these units.