How do I convert a Linux disk image into a sparse file?

linux filesystems mount compression

111

Solution 1

First of all, sparse files are only handled transparently if you seek, not if you write zeroes.

To make it more clear, the example from Wikipedia

dd if=/dev/zero of=sparse-file bs=1k count=0 seek=5120

does not write any zeroes, it will open the output file, seek (jump over) 5MB and then write zero zeroes (i. e. nothing at all). This command (not from Wikipedia)

dd if=/dev/zero of=sparse-file bs=1k count=5120

will write 5MB of zeroes and will not create a sparse file!

As a consequence, a file that is already non-sparse will not magically become sparse later.

Second, to make a file with lots of zeroes sparse, you have to cp it

cp --sparse=always original sparsefile

or you can use tar's or rsync's --sparse option as well.

Solution 2

Perhaps the easiest way to sparsify a file in place would be to use fallocate utility as follows:

fallocate -v --dig-holes {file_name}

fallocate(1) is provided by util-linux package on Debian.

Solution 3

Editing my answer for completeness:

Balloon empty FS space with zeroes (WARNING: this changes your disk image):

losetup --partscan --find --show disk.img

Assume it gives /dev/loop1 as the disk and there is only one partition, otherwise we need to repeat this for every partition with mountable FS in it (ignore swap partition etc.).

mkdir -p /mnt/tmp mount /dev/loop1p1 /mnt/tmp dd if=/dev/zero of=/mnt/tmp/tempfile

Let it finish to failure with ENOSPC.

/bin/rm -f /mnt/tmp/tempfile umount /mnt/tmp losetup -d /dev/loop1

Copy into a sparse image:

'dd' has an option to convert a file with zeroes to a sparse file:

dd if=disk.img of=disk-sparse.img conv=sparse

Solution 4

Do you mean that your ddrescue created image is, say, 50 GB and in reality something much less would suffice?

If that's the case, couldn't you just first create a new image with dd:

dd if=/dev/zero of=some_image.img bs=1M count=20000

and then create a filesystem in it:

mkfsofyourchoice some_image.img

then just mount the image, and copy everything from the old image to new one? Would that work for you?

Solution 5

PartImage can create disk images that only store the used blocks of a filesystem, thus drastically reducing the required space by ignoring unused block. I don't think you can directly mount the resulting images, but going:

image -> partimage -> image -> cp --sparse=alway

Should produce what you want (might even be possible to stick the last step, haven't tried).

View more solutions

111

user2468807

Updated on September 17, 2022

Comments

user2468807 almost 2 years

I faced an interview and was asked the following question :

Given n stairs, how many number of ways can you climb if u use either 1 or 2 at a time?

I think recursion might be useful?.. Is there any other method?
- Maroun about 11 years
  
  Indeed. Recursion is a good approach for this problem. As you know, every recursive method can be written as a non-recursive one. (For this specific problem, this can be achieved by some temp variables and loops - Think about it).
- sigpwned about 11 years
  
  I don't think you've provided enough information to get a good answer. Also, this is not really a "programming" question. You might find better answers on a different Stack Exchange site, like math.stackexchange.com. For future reference, you're also much more likely to get a positive reaction to your question if you use proper spelling and grammar. If you want people to take the time to answer your question thoughtfully, you should take the time to ask your question thoughtfully.
- Dave about 11 years
  
  The more interesting problem is that this looks solvable with a single equation. Consider factorials and triangular numbers.
- Lion about 11 years
  
  How is this related to C? The problem is language agnostic. It has nothing to do with any particular programming language.
hotei almost 14 years

Interesting - a downvote yet I notice there's no refutation of what I wrote. If it's accurate but unhelpful that's not a reason to downvote. If it's not accurate and not helpful then it does deserve it.
mihi almost 14 years

why reinvent the wheel? cp --sparse=always does the work fine
endolith almost 14 years

According to Wikipedia, writing zeros with dd will create a sparse file. Can you explain what "seeking" means?
hotei almost 14 years

@mihi: That's a good idea. I didn't know about the sparse option as it's not available in BSD flavors (freebsd.org/cgi/…) and I have never had the requirement to look at a Linux man page for cp (until today).
karthik almost 14 years

What about cat then? There is nothing in the man page about sparse files, so I assume cat /dev/zero > zero.file is perfectly OK to fill empty space with zeros?
mihi almost 14 years

@endolith: Updated my answer to make clear what the difference is to use dd for writing zeroes or for seeking.
mihi almost 14 years

@Ludwig Weinzierl: Yes, that cat command will fill your entire disk (or at least the amount not reserved for root or by quotas) with "real" zeroes, and create no sparse files.
hotei almost 14 years

@mihi: Any dd command with a count of 0 (zero) is guaranteed to do nothing by virtue of the count=0. Has nothing to do with sparse etc. I like where you're going with this but you need a better example.
mihi almost 14 years

@hotei: Even if you give count=0, it will still honour the seek option before writing zero bytes. And the example is from Wikipedia. A seek beyond the end of the disk will create a sparse file, regardless if you write after the seek or not.
endolith about 13 years

tar or rsync with sparse is still making a copy of the file, right? so you need space for two copies of it.
mihi about 13 years

@endolith you will need extra space, yes. but since you can compress the tarball, you will only need space for the original file and a compressed version of the sparse file.
endolith about 11 years

Just used this to reduce a file from 21G to 86M. :D
Dave about 11 years

Elegant. But this whole question probably belongs on math exchange.
Dave about 11 years

This is amazingly inefficient considering this has already been shown to be a fibonacci sequence. It can be solved with a single equation: floor( pow( 0.5 + SQRT5 * 0.5, (double) n ) / SQRT5 + 0.5 ) (with SQRT5 defined appropriately)
Perkins over 7 years

Unfortunately the images created by partimage are not mountable without expanding them out again, making them suitable only for archival purposes.
Perkins over 7 years

One way to have your compressed images and mount them too is to simply store them on a filesystem that supports native compression. Makes data recovery awful if you have a drive crash, but that's what backups are for, right?
Ruslan about 7 years

For some reason, fallocate --dig-holes resulted in 103GiB file from 299GiB original, while cp --sparse=always gave me 93GiB — all with the same SHA1 sum (sizes checked via du -B1G vs du --apparent-size -B1G). So fallocate seems to give inferior results.
endolith almost 6 years

as of 2012 git.savannah.gnu.org/cgit/coreutils.git/commit/…
Lam Das almost 6 years

Yes, this option is not from the time when OP asked. This was more of "leave a bread crumb for other searchers"...:-)
mihi almost 6 years

depending on filesystem type, zerofree may be faster than mounting and writing zeroes to the filesystem, and making the disk image grow less if it already contained lots of zeroes.
Soruk about 3 years

Under newer kernels, you can now fstrim a filesystem in a file, which appears to make the file sparse too, and is likely to be much quicker than writing zeros then copying the file as sparse. (Again, a bread crumb for other searchers.)