Backup and Restore Using dd and gzip

linux backup gzip dd clone

6,090

Solution 1

Writing to a Windows CIFS share SMB1

The word from Microsoft is: "In Windows NTFS file systems, files are not made sparse by default. The application or user needs to explicitly mark the file sparse via the FSCTL_SET_SPARSE control code." Unfortunately Linux doesn't mark these files via SMB1. Reportedly if you first make the file sparse on the Windows side (with Cygwin dd if=/dev/zero of=BigFile bs=1M count=1 seek=150000), then you can continue to write it as sparse from Linux. I believe the reading will be unoptimized.

Experiments

With RHEL6 coreutils-8.4 the cp --sparse=always local_file /mnt/cifs/file_on_cifs doesn't write a sparse file. When reading a CIFS file, it reads the zero'd areas (no fiemap optimization). In RHEL6 both backup and restore will transfer the entire file via network; better gzip it.

Same situation with coreutils-8.25 on Ubuntu 14x.

Writing to a Windows CIFS share SMB2/SMB3

There is a 2014 patch "Add sparse file support to SMB2/SMB3 mounts", so hopes are sparse files will be supported on mounted shares of Windows 8.1 and other platforms.

Writing to a Linux CIFS share

When you mount on Linux client a Samba share from some Linux server you can make write sparse files even on SMB1. There is no reading optimization.

Solution 2

You can use ddrescue with its -S option:

-S --sparse Use sparse writes for outfile. (The blocks of zeros are not actually allocated on disc). May save a lot of disc space in some cases. Not all systems support this. Only regular files can be sparse.

You can issue something similar to ddrescue /dev/sda1 /path/to/outfile

6,090

Betty Von Schmartenhausen

Updated on September 18, 2022

Comments

Betty Von Schmartenhausen over 1 year
I've seen various posts discussing the use of dd for creating an image of a drive and only storing 'used data'. Before posing the problem/question, let's assume a few things.

Assumptions
1. The drive to clone/image is /dev/sda
2. /dev/sda is 10TBs
3. Used space on /dev/sda is 1TB
4. Storage of the image is to some remote CIFS mounted location
Question/Problem

Using something like cp with the --sparse=always option in conjunction with dd should produce a sparse file so that the file appears as 1GB:
```
cp --sparse=always <(dd if=/dev/sda bs=8M) /mnt/remote/location/disk.img
```
Alternatively something like below, should compress all zeroed space:
```
dd if=/dev/sda1 | gzip -c > /mnt/remote/location/disk.img.gz
```
So, what is the impact of a sparse image file upon restore? Will the transferred data be 1GB, or 10GBs including the perceived empty/zeroed space? This is obviously a consideration for assessing potential network load and time-to-restore.

P.S. I understand there are other options such as Clonezilla and something like ddrescue will allow resume capability but the question is specifically about using dd in the context above.

Thanks.
Betty Von Schmartenhausen about 7 years

Ok, great as this is a Linux CIFS share so I understand from your answer, sparse files will be fine. Is the full size always going to be read on restore (i.e. 10TBs rather than 1TB?). Thanks.
Naveed Abbas about 7 years

With gzip image, there is no sparse file at all: you read all the bytes, you write all the bytes. In case of sparse file, you write 1TB but you read 10TB via network.
Betty Von Schmartenhausen about 7 years

Sure, but the objective is to minimise restore time from a network location. On restore, wouldn't this still transfer the perceived file size rather than actual data minus zeros?
shodanshok about 7 years

It depends on how the cifs implementation treat the sparse file. You had to made some tests, I think.