Backup and Restore Using dd and gzip
Solution 1
Writing to a Windows CIFS share SMB1
The word from Microsoft is: "In Windows NTFS file systems, files are not made sparse by default. The application or user needs to explicitly mark the file sparse via the FSCTL_SET_SPARSE control code." Unfortunately Linux doesn't mark these files via SMB1. Reportedly if you first make the file sparse on the Windows side (with Cygwin dd if=/dev/zero of=BigFile bs=1M count=1 seek=150000
), then you can continue to write it as sparse from Linux. I believe the reading will be unoptimized.
Experiments
With RHEL6 coreutils-8.4 the cp --sparse=always local_file /mnt/cifs/file_on_cifs
doesn't write a sparse file. When reading a CIFS file, it reads the zero'd areas (no fiemap optimization). In RHEL6 both backup and restore will transfer the entire file via network; better gzip it.
Same situation with coreutils-8.25 on Ubuntu 14x.
Writing to a Windows CIFS share SMB2/SMB3
There is a 2014 patch "Add sparse file support to SMB2/SMB3 mounts", so hopes are sparse files will be supported on mounted shares of Windows 8.1 and other platforms.
Writing to a Linux CIFS share
When you mount on Linux client a Samba share from some Linux server you can make write sparse files even on SMB1. There is no reading optimization.
Solution 2
You can use ddrescue with its -S
option:
-S
--sparse
Use sparse writes for outfile. (The blocks of zeros are not actually allocated on disc). May save a lot of disc space in some cases. Not all systems support this. Only regular files can be sparse.
You can issue something similar to ddrescue /dev/sda1 /path/to/outfile
Related videos on Youtube
Betty Von Schmartenhausen
Updated on September 18, 2022Comments
-
Betty Von Schmartenhausen over 1 year
I've seen various posts discussing the use of dd for creating an image of a drive and only storing 'used data'. Before posing the problem/question, let's assume a few things.
Assumptions
- The drive to clone/image is /dev/sda
- /dev/sda is 10TBs
- Used space on /dev/sda is 1TB
- Storage of the image is to some remote CIFS mounted location
Question/Problem
Using something like
cp
with the--sparse=always
option in conjunction withdd
should produce a sparse file so that the file appears as 1GB:cp --sparse=always <(dd if=/dev/sda bs=8M) /mnt/remote/location/disk.img
Alternatively something like below, should compress all zeroed space:
dd if=/dev/sda1 | gzip -c > /mnt/remote/location/disk.img.gz
So, what is the impact of a sparse image file upon restore? Will the transferred data be 1GB, or 10GBs including the perceived empty/zeroed space? This is obviously a consideration for assessing potential network load and time-to-restore.
P.S. I understand there are other options such as Clonezilla and something like ddrescue will allow resume capability but the question is specifically about using dd in the context above.
Thanks.
-
Betty Von Schmartenhausen about 7 yearsOk, great as this is a Linux CIFS share so I understand from your answer, sparse files will be fine. Is the full size always going to be read on restore (i.e. 10TBs rather than 1TB?). Thanks.
-
Naveed Abbas about 7 yearsWith gzip image, there is no sparse file at all: you read all the bytes, you write all the bytes. In case of sparse file, you write 1TB but you read 10TB via network.
-
Betty Von Schmartenhausen about 7 yearsSure, but the objective is to minimise restore time from a network location. On restore, wouldn't this still transfer the perceived file size rather than actual data minus zeros?
-
shodanshok about 7 yearsIt depends on how the cifs implementation treat the sparse file. You had to made some tests, I think.