rsync an already compressed file

9,103

Solution 1

Compressing in transit an already-compressed file is usually not worth the CPU time. There are caveats. In the process of comparing two files, using rsync with compression can speed up the comparison of hashes of the data.

If you only want to sync compressed versions of large files on more than one system, one place to look would be certain builds of gzip. On an Ubuntu system, I get:

$ gzip -h
Usage: gzip [OPTION]... [FILE]...
Compress or uncompress FILEs (by default, compress FILES in-place).

Mandatory arguments to long options are mandatory for short options too.

  -c, --stdout      write on standard output, keep original files unchanged
  -d, --decompress  decompress
  -f, --force       force overwrite of output file and compress links
  -h, --help        give this help
  -l, --list        list compressed file contents
  -L, --license     display software license
  -n, --no-name     do not save or restore the original name and time stamp
  -N, --name        save or restore the original name and time stamp
  -q, --quiet       suppress all warnings
  -r, --recursive   operate recursively on directories
  -S, --suffix=SUF  use suffix SUF on compressed files
  -t, --test        test compressed file integrity
  -v, --verbose     verbose mode
  -V, --version     display version number
  -1, --fast        compress faster
  -9, --best        compress better
    --rsyncable   Make rsync-friendly archive

With no FILE, or when FILE is -, read standard input.

Report bugs to .

Notice that --rsyncable option? It avoids using adaptive compression so that only small pieces of the compressed file are changed when there's only a small change to the source file. The remainder of the binary data is unchanged so that rsync won't need to retransmit the whole thing. The man page indicates that this option shouldn't increase the size of the compressed file by more than around 1% compared to without using the option, and that gunzip won't know the difference.

I have a 468MB sql file that I compressed to 57MB with the --rsyncable option. I transfer this file to my local system. Then I add a one line comment to the original sql file on the remote system, and recompress with the rsyncable option.

$ rsync -avvz --progress -h fooboo:foo.sql.gz .
opening connection using ssh fooboo rsync --server --sender -vvlogDtprz . foo.sql.gz 
receiving file list ... 
1 file to consider
delta-transmission enabled
foo.sql.gz
      59.64M 100%   43.22MB/s    0:00:01 (xfer#1, to-check=0/1)
total: matches=7723  hash_hits=9468  false_alarms=0 data=22366

sent 54.12K bytes  received 22.58K bytes  17.05K bytes/sec
total size is 59.64M  speedup is 777.59

Not bad. Rsync only had to transfer a small amount of the newer compressed file.

Solution 2

rsync will not make an already compressed file significantly smaller during transit.

It is unlikely that your failed transfers will be fixed by adding the -z flag. I would suggest trying to rsync the file(s) uncompressed. rsync will then compress on the fly. You then have the advantage that should the source file change and you need to rsync again, only the changed bytes will be transferred. If you change a compressed file rsync will most likely have to retransmit it in its entirety. See here for more details:

http://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/

Solution 3

Using rsync -z will not have any advantage over just rsync when dealing with a file that has already been compressed using a good compression format. However, you might consider splitting your compressed file into smaller pieces, so you are able to transmit it using rsync.

Here is a guide for linux: http://www.techiecorner.com/107/how-to-split-large-file-into-several-smaller-files-linux/ And for Windows: http://www.online-tech-tips.com/computer-tips/how-to-split-a-large-file-into-multiple-smaller-pieces/

Share:
9,103

Related videos on Youtube

ben
Author by

ben

Updated on September 18, 2022

Comments

  • ben
    ben almost 2 years

    will rysnc -z have any compression advantage if the input file is already gzipped? I have a large 100GB compressed file to send over the network across servers and it consistently failed(broken pipe) after various amount of time. Wondering if I should try the -z flag.

    • FauxFaux
      FauxFaux over 10 years
      I suspect you were looking for the --partial option, which allows resumption of the transfer, regardless of what went wrong.
  • Raza
    Raza almost 11 years
    It would be nice to compare what the transfer would use without the --rsyncable option.