what is file hole and how can it be used?

12,364

Solution 1

Files with holes are usually referred to as sparse files.

They are useful when a program needs to access a wide range of addresses (offset) but is unlikely to touch all of the potential blocks. This can be used by virtualization products to store virtual disks. Let's say you configure a virtual machine with a 20 GB disk but it won't be full of data quickly. It is much faster to create a 20 GB sparse file that will only use a couple of disk blocks at the beginning and then have the VM creating a file system and storing files at a low pace.

A large sparse file can also have its size reduced when some of its blocks are blanked (i.e. filled with null bytes). The sparse file aware program doing it can, instead of actually writing to the blocks, remove them from the file (i.e. punch holes in the file) with the very same effect because unallocated blocks are returning zeroes when read by a program.

Sparse files are the opposite of preallocation, they are what is called thin provisioning or might also be called disk overcommitment. This allows creating more "virtual disk space" than the actual hardware supports and add more disk to grow the file system only when necessary.

Solution 2

Holes are "useful" in the sense that they reduce disk space use (they make more space available). They aren't use able in any other sense. The existence of holes as part of a filesystem representation is "useful" when one has sparse files that contain large blocks of zeroes.

Holes don't have anything to do with pre-allocation. Pre-allocation makes space available on the disk for data in a file before the file actually has that data. Holes are a representation of data ... specifically of blocks consisting solely of zeroes.

Share:
12,364

Related videos on Youtube

Jimm
Author by

Jimm

Updated on September 15, 2022

Comments

  • Jimm
    Jimm over 1 year

    To my understanding, holes are perhaps maintained as metadata at inode, but actual disk is not filled with empty zeros.

    1. Can someone explain with real life usage examples, where holes in a file can be useful?

    2. Is holes same as soft preallocation? From diskusage perspective, even though actual disk space is not used, but that space is also not available for other process.

    • itisravi
      itisravi over 11 years
      The question was closed before I finished typing my answer, so here goes:The real advantage of holes (in a VM scenario) is when you actually delete data from the virtual disk.Suppose you've used up the 20Gigs of the VM's disk space and you decide to delete some data.Without sparse file support, in-spite of the deletion, the 20Gigs still remain occupied in the underlying physical hard disk.But if the filesystem supports holes, then the VM can 'punch' a hole corresponding to the files deleted, thereby freeing up physical disk space.Hole punching is supported by fallocate() on some filesystems.
  • Jimm
    Jimm over 11 years
    I still dont get advantage of sparse files in the context of VM. Why not simply grow the file on need basis. For example if user requested upto 20GB of space for VM, preallocate 1GB. At some threshold of actual usage, preallocate more.
  • jlliagre
    jlliagre over 11 years
    Well, the fact the files may become fragmented and lead to unexpected disk full situation is precisely due to the fact their space was not allocated. If they were preallocated, there would have been no fragmentation. Sparse files are definitely overcommitment and that the opposite of what you mention in your question: space is also not available for other processes. With sparse files, space is available for other files, there is no reservation at all.
  • jlliagre
    jlliagre over 11 years
    You are still missing the point. Grow on the need basis is precisely what sparse files are providing.
  • Jimm
    Jimm over 11 years
    you can increase existing file size of a non sparse file anytime, as long as there is disk space. So, what is creating a sparse file buy me? Would it reserve block address? It sounds like, it does not reserve anything, so then i am wondering what is the purpose of creating one?
  • jlliagre
    jlliagre over 11 years
    Yes you kind of reserve addresses, but this reservation takes (almost) no disk space. The advantage is for the virtualized OS to immediately see a large disk and be able to create properly dimensioned partitions on it and then layout file systems in these partitions in a very economical manner. Should you choose the non sparse file way, you would not be able to have more than one growable partition (without using volume management if available) and enlarging the file systems would add pointless administrative burden.