Converting sparse file to non-sparse in place
On the face of it, it's a simple dd
:
dd if=sparsefile of=sparsefile conv=notrunc bs=1M
That reads the entire file, and writes the entire contents back to it.
In order to only write the hole itself, you first have to determine where those holes are. You can do that using either filefrag
or hdparm
:
filefrag:
# filefrag -e sparsefile
Filesystem type is: 58465342
File size of sparsefile is 10737418240 (2621440 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 1048575: 187357696.. 188406271: 1048576:
1: 1572864.. 2621439: 200704128.. 201752703: 1048576: 188406272: last,eof
sparsefile: 2 extents found
hdparm:
# hdparm --fibmap sparsefile
sparsefile:
filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
0 1498861568 1507250175 8388608
6442450944 1605633024 1614021631 8388608
This example file is, as you say, 10G
in size with a 2G
hole. It has two extents, the first covering 0-1048575
, the second 1572864-2621439
, which means that the hole is 1048576-1572864
(in 4k sized blocks, as shown by filefrag
). The info shown by hdparm
is the same, just displayed differently (first extent covers 8388608
512-byte sectors starting from 0 so it's 0-4294967295
bytes, so the hole is 4294967296-6442450944
in bytes.
Note that you may be shown considerably more extents anyway if there is any fragmentation. Unfortunately, neither command shows the holes directly, and I don't know one that does such, so you have to deduce it from the logical offsets shown.
Now, filling that 1048576-1572864
hole with dd
as shown above, can be done by adding appropriate (identical) seek
/skip
values and count
. Note that the bs=
was adapted to use the 4k
sectors as used by filefrag
above. (For bs=1M
, you'd have to adapt the seek/skip/count values to reflect 1M
sized blocks).
dd if=sparsefile of=sparsefile conv=notrunc \
bs=4k seek=1048576 skip=1048576 count=$((-1048576+1572864))
While you could fill holes with /dev/zero
instead of reading the hole of the file itself (which will also just yield zeroes), it is safer to read from the sparsefile
anyway so you won't corrupt your data in case you got an offset wrong.
In newer versions of GNU dd
, you may stick to a larger blocksize and specify all values in bytes:
dd if=sparsefile of=sparsefile conv=notrunc bs=1M \
iflag=skip_bytes,count_bytes oflag=seek_bytes \
seek=4294967296 skip=4294967296 count=$((-4294967296+6442450944))
filefrag
after running that:
# sync
# filefrag -e sparsefile
Filesystem type is: 58465342
File size of sparsefile is 10737418240 (2621440 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 1572863: 187357696.. 188930559: 1572864:
1: 1572864.. 2621439: 200704128.. 201752703: 1048576: 188930560: last,eof
sparsefile: 2 extents found
Due to fragmentation, it's still two extents. However, the logical offsets show that this time, there is no hole, so the file is no longer sparse.
Naturally, this dd
solution is the very manual approach to things. If you need this on a regular basis, it would be easy to write a small program that fills such gaps. If it already exists as a standard tool, I haven't heard of it yet.
There is a tool after all, fallocate
seems to work, after a fashion:
fallocate -l $(stat --format="%s" sparsefile) sparsefile
However at last in case of XFS, while it does allocate physical area for this file, it does not actually zero it out. filefrag
shows such extents as allocated, but unwritten.
2: 3.. 15: 7628851.. 7628863: 13: 7629020: unwritten
This is not good enough if the intent is to be able to read the correct data directly from the block device. It only reserves the storage space needed for future writes.
Related videos on Youtube
Ivan
Updated on September 18, 2022Comments
-
Ivan almost 2 years
On Linux, given a sparse file, how to make it non-sparse, in place?
It could be copied withcp --sparse=never ...
, but if the file is say 10G and the hole is 2G (that is the allocated space is 8G), how to make the filesystem allocate the remaining 2G without copying the original 8G to a new file? -
Stéphane Chazelas over 9 yearsOr
cat sparsefile 1<> sparsefile
. You may be able to usefallocate
on Linux to avoid having to write those NUL bytes if all you want is the space to be allocated. -
frostschutz over 9 years@StéphaneChazelas, thanks, forgot about
fallocate
. It has--dig-holes
but no--fill-holes
. However, it seems to work well enough when you specify the size. I will edit my answer. -
Ivan over 9 yearsOn NFS or ext3 fallocate is not supported.
-
Stéphane Chazelas over 9 yearsNewer
fallocate
have a-z
which can be used in Linux 3.14 and above on ext4 and xfs (you'd need to run it with-o
and-l
for all the sparse sections I suppose). -
frostschutz over 9 years@StéphaneChazelas, yup, but this
-z
does not keep your data if you happen to get an offset wrong, so I'll stick todd
there... -
frostschutz over 9 years@Ivan, it may be better to just drop the sparse property when you untar then; less likely to produce a file with many fragments, which you usually get if you have a filesystem as a sparse file with holes all over the place. Then again, some virtualization solutions deliberately punch holes in files (TRIM support inside the VM) to save storage space on the host. Personally, I prefer giving real block devices to VMs instead of file containers.
-
frostschutz over 9 years@Ivan, also if it's a sparse filesystem image, you could just mount it and zero the free space inside of it, instead of using the complicated
dd
method above. It's the kind of detail you should have mentioned in the question... ;) -
Kornelis over 2 years@StéphaneChazelas I tried cat with 1<> and got "cat: sparsefile: input file is output file"
-
Stéphane Chazelas over 2 years@BrianMinton, yes, with the GNU implementation of
cat
, you'd need<file cat|cat 1<> file
to work around that safeguard.