Converting sparse file to non-sparse in place

files sparse-files

8,417

On the face of it, it's a simple dd:

dd if=sparsefile of=sparsefile conv=notrunc bs=1M

That reads the entire file, and writes the entire contents back to it.

In order to only write the hole itself, you first have to determine where those holes are. You can do that using either filefrag or hdparm:

filefrag:

# filefrag -e sparsefile
Filesystem type is: 58465342
File size of sparsefile is 10737418240 (2621440 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0.. 1048575:  187357696.. 188406271: 1048576:            
   1:  1572864.. 2621439:  200704128.. 201752703: 1048576:  188406272: last,eof
sparsefile: 2 extents found

hdparm:

# hdparm --fibmap sparsefile

sparsefile:
 filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
           0 1498861568 1507250175    8388608
  6442450944 1605633024 1614021631    8388608

This example file is, as you say, 10G in size with a 2G hole. It has two extents, the first covering 0-1048575, the second 1572864-2621439, which means that the hole is 1048576-1572864 (in 4k sized blocks, as shown by filefrag). The info shown by hdparm is the same, just displayed differently (first extent covers 8388608 512-byte sectors starting from 0 so it's 0-4294967295 bytes, so the hole is 4294967296-6442450944 in bytes.

Note that you may be shown considerably more extents anyway if there is any fragmentation. Unfortunately, neither command shows the holes directly, and I don't know one that does such, so you have to deduce it from the logical offsets shown.

Now, filling that 1048576-1572864 hole with dd as shown above, can be done by adding appropriate (identical) seek/skip values and count. Note that the bs= was adapted to use the 4k sectors as used by filefrag above. (For bs=1M, you'd have to adapt the seek/skip/count values to reflect 1M sized blocks).

dd if=sparsefile of=sparsefile conv=notrunc \
   bs=4k seek=1048576 skip=1048576 count=$((-1048576+1572864))

While you could fill holes with /dev/zero instead of reading the hole of the file itself (which will also just yield zeroes), it is safer to read from the sparsefile anyway so you won't corrupt your data in case you got an offset wrong.

In newer versions of GNU dd, you may stick to a larger blocksize and specify all values in bytes:

dd if=sparsefile of=sparsefile conv=notrunc bs=1M \
   iflag=skip_bytes,count_bytes oflag=seek_bytes \
   seek=4294967296 skip=4294967296 count=$((-4294967296+6442450944))

filefrag after running that:

# sync
# filefrag -e sparsefile 
Filesystem type is: 58465342
File size of sparsefile is 10737418240 (2621440 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0.. 1572863:  187357696.. 188930559: 1572864:            
   1:  1572864.. 2621439:  200704128.. 201752703: 1048576:  188930560: last,eof
sparsefile: 2 extents found

Due to fragmentation, it's still two extents. However, the logical offsets show that this time, there is no hole, so the file is no longer sparse.

Naturally, this dd solution is the very manual approach to things. If you need this on a regular basis, it would be easy to write a small program that fills such gaps. If it already exists as a standard tool, I haven't heard of it yet.

There is a tool after all, fallocate seems to work, after a fashion:

fallocate -l $(stat --format="%s" sparsefile) sparsefile

However at last in case of XFS, while it does allocate physical area for this file, it does not actually zero it out. filefrag shows such extents as allocated, but unwritten.

   2:        3..      15:    7628851..   7628863:     13:    7629020: unwritten

This is not good enough if the intent is to be able to read the correct data directly from the block device. It only reserves the storage space needed for future writes.

8,417

Ivan

Updated on September 18, 2022

Comments

Ivan almost 2 years

On Linux, given a sparse file, how to make it non-sparse, in place?
It could be copied with cp --sparse=never ..., but if the file is say 10G and the hole is 2G (that is the allocated space is 8G), how to make the filesystem allocate the remaining 2G without copying the original 8G to a new file?
Stéphane Chazelas over 9 years

Or cat sparsefile 1<> sparsefile. You may be able to use fallocate on Linux to avoid having to write those NUL bytes if all you want is the space to be allocated.
frostschutz over 9 years

@StéphaneChazelas, thanks, forgot about fallocate. It has --dig-holes but no --fill-holes. However, it seems to work well enough when you specify the size. I will edit my answer.
Ivan over 9 years

On NFS or ext3 fallocate is not supported.
Stéphane Chazelas over 9 years

Newer fallocate have a -z which can be used in Linux 3.14 and above on ext4 and xfs (you'd need to run it with -o and -l for all the sparse sections I suppose).
frostschutz over 9 years

@StéphaneChazelas, yup, but this -z does not keep your data if you happen to get an offset wrong, so I'll stick to dd there...
frostschutz over 9 years

@Ivan, it may be better to just drop the sparse property when you untar then; less likely to produce a file with many fragments, which you usually get if you have a filesystem as a sparse file with holes all over the place. Then again, some virtualization solutions deliberately punch holes in files (TRIM support inside the VM) to save storage space on the host. Personally, I prefer giving real block devices to VMs instead of file containers.
frostschutz over 9 years

@Ivan, also if it's a sparse filesystem image, you could just mount it and zero the free space inside of it, instead of using the complicated dd method above. It's the kind of detail you should have mentioned in the question... ;)
Kornelis over 2 years

@StéphaneChazelas I tried cat with 1<> and got "cat: sparsefile: input file is output file"
Stéphane Chazelas over 2 years

@BrianMinton, yes, with the GNU implementation of cat, you'd need <file cat|cat 1<> file to work around that safeguard.