Can a file that was originally sparse and then expanded be made sparse again?
Solution 1
Edit 2015
as of util-linux 2.25, the fallocate
utility on Linux has a -d
/--dig-hole
option for that.
fallocate -d the-file
Would dig a hole for every block full of zeros in the file
On older systems, you can do it by hand:
Linux has a FALLOC_FL_PUNCH_HOLE
option to fallocate
that can do this. I found a script on github with an example:
Using FALLOC_FL_PUNCH_HOLE from Python
I modified it a bit to do what you asked -- punch holes in regions of files that are filled with zeros. Here it is:
Using FALLOC_FL_PUNCH_HOLE from Python to punch holes in files
usage: punch.py [-h] [-v VERBOSE] FILE [FILE ...]
Punch out the empty areas in a file, making it sparse
positional arguments:
FILE file(s) to modify in-place
optional arguments:
-h, --help show this help message and exit
-v VERBOSE, --verbose VERBOSE
be verbose
Example:
# create a file with some data, a hole, and some more data
$ dd if=/dev/urandom of=test1 bs=4096 count=1 seek=0
$ dd if=/dev/urandom of=test1 bs=4096 count=1 seek=2
# see that it has holes
$ du --block-size=1 --apparent-size test1
12288 test1
$ du --block-size=1 test1
8192 test1
# copy it, ignoring the hole
$ cat test1 > test2
$ du --block-size=1 --apparent-size test2
12288 test2
$ du --block-size=1 test2
12288 test2
# punch holes again
$ ./punch.py test2
$ du --block-size=1 --apparent-size test2
12288 test2
$ du --block-size=1 test2
8192 test2
# verify
$ cmp test1 test2 && echo "files are the same"
files are the same
Note that punch.py
only finds blocks of 4096 bytes to punch out, so it might not make a file exactly as sparse as it was when you started. It could be made smarter, of course. Also, it's only lightly tested, so be careful and make backups before trusting it!
Solution 2
If you want to make a file sparse you can do that directly with dd
.
dd if=./zeropadded.iso of=./isnowsparse.iso conv=sparse
From the dd(1)
manual:
sparse If one or more output blocks would consist solely of
NUL bytes, try to seek the output file by the required
space instead of filling them with NULs, resulting in a
sparse file.
So, note that it will seek ahead only if the entire block is empty. For maximum sparseness use bs=1
.
Solution 3
I've had good luck with this:
cd whatever
rsync -avxWSHAXI . .
The -I
forces rsync to update all files, regardless of whether it thinks they've changed or not; the -S
causes the new files to be sparsified. -a
makes it happen recursively so you can sparsify whole directory trees in one command.
It's not as good as a bespoke tool which hunts out holes and destroys them with FALLOC_FL_PUNCH_HOLE
, but it's better than having to duplicate entire directory trees.
Solution 4
Short of tar
-ing it up with a -S
flag (assuming GNU tar), and re-executing the scp
... no. No utility I'm aware of would have a way of knowing where the "holes" were.
Related videos on Youtube
user25849
Updated on September 18, 2022Comments
-
user25849 almost 2 years
I know that copying or transferring what was originally a sparse file without using a utility that understands sparse files will cause the 'holes' to be filled out. Is there a method or utility to turn what was once a sparse file back to sparse?
For Example:
create sparse file:% dd if=/dev/zero of=TEST bs=1 count=0 seek=1G # do some op that pads out the holes % scp TEST localhost:~/TEST2 % ls -lhs TEST* 0 -rw-rw-r--. 1 tony tony 1.0G Oct 16 13:35 TEST 1.1G -rw-rw-r--. 1 tony tony 1.0G Oct 16 13:37 TEST2
Is there some way to:
% resparse TEST2 to get: 0 -rw-rw-r--. 1 tony tony 1.0G Oct 16 13:35 TEST 0G -rw-rw-r--. 1 tony tony 1.0G Oct 16 13:37 TEST2
-
user25849 over 11 yearsSorry, I had to pretty up the original ques...
-
user25849 over 11 yearsThe only thing that can do this from all I've seen is a GNU 'cp', as in '% cp --sparse=always formerly-sparse-file newly-sparse-file' The detractor is it will not do it 'in-place'.
-
Gilles 'SO- stop being evil' over 11 yearsIf you want to copy a sparse file and let the copy be sparse, use
rsync -aS
.
-
-
user25849 over 11 yearsGNU cp will resparse a file: From the man page: Specify --sparse=always to create a sparse DEST file whenever the SOURCE file contains a long enough sequence of zero bytes.
-
tink over 11 yearsAwesome. Learn something every day - when was that flag introduced? Pays to read man-pages of "well known" programs once in a while ;D
-
Hallaghan over 10 yearsAny block size less than
bs=512
doesn't really make sense, as disks are block devices. (bs=4096
in newer drives) -
vipin almost 10 yearsI like this the best because it doesn't require rewriting the whole file again.
-
maxschlepzig over 7 yearslooks like this is equivalent to
cp --sparse=always zeropadded.iso isnowsparse.iso