Backup and Restore Using dd and gzip

6,090

Solution 1

Writing to a Windows CIFS share SMB1

The word from Microsoft is: "In Windows NTFS file systems, files are not made sparse by default. The application or user needs to explicitly mark the file sparse via the FSCTL_SET_SPARSE control code." Unfortunately Linux doesn't mark these files via SMB1. Reportedly if you first make the file sparse on the Windows side (with Cygwin dd if=/dev/zero of=BigFile bs=1M count=1 seek=150000), then you can continue to write it as sparse from Linux. I believe the reading will be unoptimized.

Experiments

With RHEL6 coreutils-8.4 the cp --sparse=always local_file /mnt/cifs/file_on_cifs doesn't write a sparse file. When reading a CIFS file, it reads the zero'd areas (no fiemap optimization). In RHEL6 both backup and restore will transfer the entire file via network; better gzip it.

Same situation with coreutils-8.25 on Ubuntu 14x.

Writing to a Windows CIFS share SMB2/SMB3

There is a 2014 patch "Add sparse file support to SMB2/SMB3 mounts", so hopes are sparse files will be supported on mounted shares of Windows 8.1 and other platforms.

Writing to a Linux CIFS share

When you mount on Linux client a Samba share from some Linux server you can make write sparse files even on SMB1. There is no reading optimization.

Solution 2

You can use ddrescue with its -S option:

-S --sparse Use sparse writes for outfile. (The blocks of zeros are not actually allocated on disc). May save a lot of disc space in some cases. Not all systems support this. Only regular files can be sparse.

You can issue something similar to ddrescue /dev/sda1 /path/to/outfile

Share:
6,090

Related videos on Youtube

Betty Von Schmartenhausen
Author by

Betty Von Schmartenhausen

Updated on September 18, 2022

Comments

  • Betty Von Schmartenhausen
    Betty Von Schmartenhausen over 1 year

    I've seen various posts discussing the use of dd for creating an image of a drive and only storing 'used data'. Before posing the problem/question, let's assume a few things.

    Assumptions

    1. The drive to clone/image is /dev/sda
    2. /dev/sda is 10TBs
    3. Used space on /dev/sda is 1TB
    4. Storage of the image is to some remote CIFS mounted location

    Question/Problem

    Using something like cp with the --sparse=always option in conjunction with dd should produce a sparse file so that the file appears as 1GB:

    cp --sparse=always <(dd if=/dev/sda bs=8M) /mnt/remote/location/disk.img
    

    Alternatively something like below, should compress all zeroed space:

    dd if=/dev/sda1 | gzip -c > /mnt/remote/location/disk.img.gz
    

    So, what is the impact of a sparse image file upon restore? Will the transferred data be 1GB, or 10GBs including the perceived empty/zeroed space? This is obviously a consideration for assessing potential network load and time-to-restore.

    P.S. I understand there are other options such as Clonezilla and something like ddrescue will allow resume capability but the question is specifically about using dd in the context above.

    Thanks.

  • Betty Von Schmartenhausen
    Betty Von Schmartenhausen about 7 years
    Ok, great as this is a Linux CIFS share so I understand from your answer, sparse files will be fine. Is the full size always going to be read on restore (i.e. 10TBs rather than 1TB?). Thanks.
  • Naveed Abbas
    Naveed Abbas about 7 years
    With gzip image, there is no sparse file at all: you read all the bytes, you write all the bytes. In case of sparse file, you write 1TB but you read 10TB via network.
  • Betty Von Schmartenhausen
    Betty Von Schmartenhausen about 7 years
    Sure, but the objective is to minimise restore time from a network location. On restore, wouldn't this still transfer the perceived file size rather than actual data minus zeros?
  • shodanshok
    shodanshok about 7 years
    It depends on how the cifs implementation treat the sparse file. You had to made some tests, I think.