What's the best way to use parallel bzip2 and gzip by default?

43,374

Solution 1

You can symlink bzip2, bunzip2 and bzcat to lbzip2, and gzip, gunzip, gzcat and zcat to pigz:

sudo apt-get install lbzip2 pigz
cd /usr/local/bin
ln -s /usr/bin/lbzip2 bzip2
ln -s /usr/bin/lbzip2 bunzip2
ln -s /usr/bin/lbzip2 bzcat
ln -s /usr/bin/pigz gzip
ln -s /usr/bin/pigz gunzip
ln -s /usr/bin/pigz gzcat
ln -s /usr/bin/pigz zcat

I chose lbzip2 instead of pbzip2 because the /usr/share/doc/lbzip2/README.gz looks "nicer" than /usr/share/doc/pbzip2/README.gz. Also, the tar manual talks about lbzip2.

Edit:

pigz-2.1.6, which is included in Precise Pangolin, refuses to decompress files with unknown suffixes (e.g. initramfs-*.img). This is fixed in pigz-2.2.4, which ships with Quantal. So you might want to wait until Quantal, install the Quantal package manually, or don't link gunzip/gzcat/zcat yet.

Solution 2

The symlink idea is really fine.
Another working solution is to alias tar:

alias tar='tar --use-compress-program=pbzip2'

or respectively

alias tar='tar --use-compress-program=pigz'

It creates another kind of default.

Solution 3

The symlink answer is really incorrect. It would replace the default gzip (or bzip2) with pigz (or pbzip2) for the entire system. While the parallel implementations are remarkably similar to the single process versions, subtle differences in command line options could break core system processes who depend on those differences.

The --use-compress-program option is a much better choice.

A second option (much like the alias) would be to set the TAR_OPTIONS environment variable supported by GNU tar:

export TAR_OPTIONS="--use-compress-program=pbzip2"
tar czf myfile.tar.bz2 mysubdir/

Solution 4

One fascinating option is to recompile tar to use multithreaded by default. Copied from this stackoverflow answer

Recompiling with replacement

If you build tar from sources, then you can recompile with parameters

--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip

After recompiling tar with these options you can check the output of tar's help:

$ tar --help | grep "lbzip2\|plzip\|pigz"
  -j, --bzip2                filter the archive through lbzip2
      --lzip                 filter the archive through plzip
  -z, --gzip, --gunzip, --ungzip   filter the archive through pigz
Share:
43,374
elmicha
Author by

elmicha

Updated on September 18, 2022

Comments

  • elmicha
    elmicha over 1 year

    Bzip2 and gzip only use one core, although many computers have more than one core. But there are programs like lbzip2, pbzip2 and pigz, which use all available cores and promise to be compatible with bzip2 and gzip.

    So what's the best way to use these programs by default, so that tar cfa file.tar.bz2 directory uses lbzip2/pbzip2 instead of bzip2? Of course I don't want to break anything.

    • con-f-use
      con-f-use over 12 years
      Out of curiosity to all: Is parallel gzip/bzip really faster than serial? I would image that the hdd writing speed and other restraints are more of a problem.
    • Michael Gundlach
      Michael Gundlach over 12 years
      @con-f-use Not unless you have SSDs theoretically it could be faster as the total size of the archive increases.
    • cs_alumnus
      cs_alumnus over 8 years
      On a system with 16 cpus, switching from gzip to pigz reduced the time to tar 1.2TB and transfer it over the network and test the result from 18 hours of backup and 14 hours of test to 4 hours of backup and 2 hours of test. There are a lot of potential bottlenecks, disk speed, network speed, processing power however in this case this was definitely cpu bound more than IO bound. This is a high end system, your results may vary. Not that it matters, but this was on RHEL6
  • Mark McKinstry
    Mark McKinstry almost 12 years
    This works good because /usr/local/bin/ comes before /bin/ in most people's $PATH . If something calls /bin/gunzip directly or someone has /bin first in their $PATH, they won't use pigz. To make this work for them as well you could use dpk-divert and do something like this for all the binaries sudo dpkg-divert --divert /bin/gunzip.orig --rename /bin/gunzip; sudo ln -s /usr/bin/pigz /bin/gunzip but there is a possibility that pigz isn't 100% compatible with all the gzip flags so be careful.
  • Christian Hudon
    Christian Hudon over 8 years
    This will only work when calling the gzip (or gunzip) program directly on the shell's command-line. Other programs (like tar) won't be impacted by that.
  • jena
    jena about 7 years
    added benefit: you can use alias like 'partar' if you want to preserve the original functionality (for some reason).. sadly 'ptar' is taken by perl implementation
  • Derek Perkins
    Derek Perkins over 5 years
    This didn't work for me.
  • SergioAraujo
    SergioAraujo about 4 years
    It would be interesting testing if pigz is installed on the system, like this (zsh): (( $+commands[pigz] )) && alias tar='tar --use-compress-program=pigz'.
  • Geremia
    Geremia about 4 years
    The --with-* configure options is probably the best way of doing it.