Best compression for ZFS send/recv

compression zfs snapshot

21,955

Solution 1

It sounds like you've tried all of the best compression mechanisms and are still being limited by the line speed. Assuming running a faster line is out of the question, have you considered just running the backups less frequently so that they have more time to run?

Short of that, is there some kind of way to lower the amount of data being written? Without knowing your application stack its hard to say how, but just doing things like making sure apps are overwriting existing files instead of creating new ones might help. And making sure you arent saving backups of temp/cache files that you wont need.

Solution 2

Things have changed in the years since this question was posted:

1: ZFS now supports compressed replication, just add the -c flag to the zfs send command, and blocks what were compressed on disk will remain compressed as they pass through the pipe to the other end. There may still be more compression to be gained, because the default compression in ZFS is lz4

2: The best compressor to use in this case is zstd (ZStandard), it now has an 'adaptive' mode that will change the compression level (between the 19+ levels supported, plus the new higher speed zstd-fast levels) based on the speed of the link between zfs send and zfs recv. It compresses as much as it can while keeping the queue of data waiting to go out the pipe to a minimum. If your link is fast it won't waste time compressing the data more, and if your link is slow, it will keep working to compress the data more and save you time in the end. It also supports threaded compression, so I can take advantage of multiple cores, which gzip and bzip do not, outside of special versions like pigzip.

Solution 3

Here is what I've learned doing the exact same thing you are doing. I suggest using mbuffer. When testing in my environment it only helped on the receiving end, without it at all the send would slow down while the receive caught up.

Some examples: http://everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/

Homepage with options and syntax http://www.maier-komor.de/mbuffer.html

The send command from my replication script:

zfs send -i tank/pool@oldsnap tank/pool@newsnap | ssh -c arcfour remotehostip "mbuffer -s 128k -m 1G | zfs receive -F tank/pool"

this runs mbuffer on the remote host as a receive buffer so the sending runs as fast as possible. I run a 20mbit line and found that having mbuffer on the sending side as well didn't help, also my main zfs box is using all of it's ram as cache so giving even 1g to mbuffer would require me to reduce some cache sizes.

Also, and this isnt really my area of expertise, I think it's best to just let ssh do the compression. In your example I think you are using bzip and then using ssh which by default uses compression, so SSH is trying to compress a compressed stream. I ended up using arcfour as the cipher as it's the least CPU intensive and that was important for me. You may have better results with another cipher, but I'd definately suggest letting SSH do the compression (or turn off ssh compression if you really want to use something it doesn't support).

Whats really interesting is that using mbuffer when sending and receiving on localhost speeds things up as well:

zfs send tank/pool@snapshot | mbuffer -s 128k -m 4G -o - | zfs receive -F tank2/pool

I found that 4g for localhost transfers seems to be the sweetspot for me. It just goes to show that zfs send/receive doesn't really like latency or any other pauses in the stream to work best.

Just my experience, hope this helps. It took me awhile to figure all this out.

Solution 4

I use pbzip2 all the time (parallel bzip2) when sending over WAN. Since it is threaded you may specify the number of threads to use with the -p option. Install pbzip2 first on both sending and receiving hosts, installation instructions are at http://compression.ca/pbzip2/.

zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 | pbzip2 -c | \
ssh offsite-backup "pbzip2 -dc | zfs recv -F tank/vm"

The main key is to create snapshots at frequent intervals (~10mins) to make your snapshot size smaller then send each snapshot. ssh will not resume from a broken snapshot stream so if you have a huge snapshot to send, pipe the stream to pbzip2 then split to manageable sized chunks, then rsync split files to receiving host, then pipe to zfs recv the concatenated pbzip2 files.

zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 | pbzip2 -c | \
split -b 500M - /somedir/snap-inc-10-to-12.pbzip2--

this will produce files named in 500MB chunks:

/somedir/snap-inc-10-to-12.pbzip2--aa
/somedir/snap-inc-10-to-12.pbzip2--ab
/somedir/snap-inc-10-to-12.pbzip2--ac
...

rsync to receiving host multiple times (you may rsync even before zfs send completes or as soon as you see a complete 500MB chunk), press ctrl+c anytime to cancel:

while [[ true ]]; do rsync -avP /somedir/snap-inc-10-to-12.pbzip2--* offsite-backup:/somedir ; sleep 1; done;

zfs receive:

cat /somedir/snap-inc-10-to-12.pbzip2--* | pbzip2 -dc | zfs recv -Fv tank/vm

User freind mentioned: For what it's worth. I would not do a direct send | compress | decompress | receive this can lead to problems at the receiving end if the transfer line snaps and your pools will be offline for a long time during the receive. - I have encountered issues before with older zfs versions <28 in the receiving host if an ongoing send/recv is interrupted by network drops but not to the extent that the pools are offlined. That's interesting. Re-send the snapshot only if the "zfs recv" has exited in the receiving end. Kill the "zfs recv" manually if needed. zfs send/recv is much improved now in FreeBSD or Linux.

Solution 5

This is an answer to your specific question:

You can try rzip, but it works in ways that are a bit different from compress/bzip/gzip:

rzip expects to be able to read over the whole file, so it can't be run in a pipeline. This will greatly increase your local storage requirements and you won't be able to run a backup and send the backup over the wire in one single pipe. That said, the resulting files, at least according to this test, are quite a bit smaller.

If your resource constraint is your pipe, you'll be running backups 24x7 anyhow so you'll need to just be copying snapshots constantly and hoping you keep up anyhow.

Your new command would be:

remotedir=/big/filesystem/on/remote/machine/
while 
  snaploc=/some/big/filesystem/
  now=$(date +%s)
  snap=snapshot.$now.zfssnap
  test -f $snaploc/$snap
do
  sleep 1
done

zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 > $snaploc/$snap &&
rzip $snaploc/$snap &&
ssh offsite-backup "
        cat > $remotedir/$snap.rzip && 
        rzip -d $remotedir/$snap.rzip && 
        zfs recv -F tank/vm < $remotedir/$snap &&
        rm $remotedir/$snap " < $snaploc/$snap &&
rm $snaploc/$snap

You will want to put better error correction in, and you'll want to consider using something like rsync to transfer the compressed files so if the transfer fails in the middle you can pick up where you left off.

View more solutions

21,955

Darael

Updated on September 17, 2022

Comments

Darael over 1 year
I'm sending incremental ZFS snapshots over a point-to-point T1 line and we're to a point where a day's worth of snapshots can barely make it over the wire before the next backup starts. Our send/recv command is:
```
zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 | bzip2 -c | \
ssh offsite-backup "bzcat | zfs recv -F tank/vm"
```
I have plenty of CPU cycles to spare. Is there a better compression algorithm or alternative method I can use to push less data over the line?
- kbyrd over 14 years
  
  Have you verified it's actually the link that's the slowest part? Maybe it's the disk reading/writing.
- Darael over 14 years
  
  Yeah, I get 80-100 MBps connecting to the box via NFS. The network connection is 1.5 Mbps
- Amok over 14 years
  
  Have you tried using lzma --best?
- Philip almost 14 years
  
  As Amuck pointed to, LZMA is currently the best general data compression algorithm widely available.
- poige over 7 years
  
  For e. g., statistics that shows that zfs receive can be a culprit: received 953MB stream in 36 seconds (26.5MB/sec)
Darael over 14 years

From the unix man page: The --fast and --best aliases are primarily for GNU gzip compatibility. In particular, --fast doesn't make things signifi- cantly faster. And --best merely selects the default behaviour.
Istvan over 14 years

so it has no effect in your case. What about the cipher?
Florian Heigl over 9 years

Thanks a lot for this post. Looking at zfs send more closely I very quickly got the feeling that it has a bad behaviour (aka "design") when sending to a latency-bound target. After about a dozen results telling that zfs can't possibly ever be to blame for anything. I am very grateful you took the time to look into it and posted your results.
poige over 7 years

Since he's using -i which implies "incremental" backup, there's not that much hope that -D would give anything.
James Moore over 7 years

@poige depends on what their data looks like. If they generate lots of data that has duplicate blocks, it's a big win. I don't see how -i would make it more or less likely for there to be duplicate blocks. If you normally create data that has lots of duplication, you're probably going to be creating lots of duplication inside every day, so -i doesn't help or hurt.
poige over 7 years

Well, if you have plenty of duplicates any compression would take care of it anyways.
James Moore about 7 years

@poige They have to measure against their actual data. You can definitely have datasets that compress badly and dedup really well. For example, multiple copies of the same compressed video file dedups really well, and compression at the file system level is probably worse than useless.
poige about 7 years

Ah, this case — yep