Utilizing multi core for tar+gzip/bzip compression/decompression
Solution 1
You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz
By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.
tar cf - paths-to-archive | pigz -9 -p 32 > archive.tar.gz
Solution 2
You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.
For example use:
tar -c --use-compress-program=pigz -f tar.file dir_to_zip
Solution 3
Common approach
There is option for tar
program:
-I, --use-compress-program PROG
filter through PROG (must accept -d)
You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive
Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):
$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz
Input and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.
p7zip
For p7zip for compression you need a small shell script like the following:
#!/bin/sh
case $1 in
-d) 7za -txz -si -so e;;
*) 7za -txz -si -so a .;;
esac 2>/dev/null
Save it as 7zhelper.sh. Here the example of usage:
$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive
$ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7z
xz
Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting -T
or --threads
to an appropriate value via the environmental variable XZ_DEFAULTS (e.g. XZ_DEFAULTS="-T 0"
).
This is a fragment of man for 5.1.0alpha version:
Multithreaded compression and decompression are not implemented yet, so this option has no effect for now.
However this will not work for decompression of files that haven't also been compressed with threading enabled. From man for version 5.2.2:
Threaded decompression hasn't been implemented yet. It will only work on files that contain multiple blocks with size information in block headers. All files compressed in multi-threaded mode meet this condition, but files compressed in single-threaded mode don't even if --block-size=size is used.
Recompiling with replacement
If you build tar from sources, then you can recompile with parameters
--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip
After recompiling tar with these options you can check the output of tar's help:
$ tar --help | grep "lbzip2\|plzip\|pigz"
-j, --bzip2 filter the archive through lbzip2
--lzip filter the archive through plzip
-z, --gzip, --gunzip, --ungzip filter the archive through pigz
Solution 4
You can use the shortcut -I
for tar's --use-compress-program
switch, and invoke pbzip2
for bzip2 compression on multiple cores:
tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 DIRECTORY_TO_COMPRESS/
Solution 5
If you want to have more flexibility with filenames and compression options, you can use:
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec \
tar -P --transform='s@/my/path/@@g' -cf - {} + | \
pigz -9 -p 4 > myarchive.tar.gz
Step 1: find
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec
This command will look for the files you want to archive, in this case /my/path/*.sql
and /my/path/*.log
. Add as many -o -name "pattern"
as you want.
-exec
will execute the next command using the results of find
: tar
Step 2: tar
tar -P --transform='s@/my/path/@@g' -cf - {} +
--transform
is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use -C
option to change directory as you'll lose benefits of find
: all files of the directory would be included.
-P
tells tar
to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by --transform
anyway.
-cf -
tells tar
to use the tarball name we'll specify later
{} +
uses everyfiles that find
found previously
Step 3: pigz
pigz -9 -p 4
Use as many parameters as you want.
In this case -9
is the compression level and -p 4
is the number of cores dedicated to compression.
If you run this on a heavy loaded webserver, you probably don't want to use all available cores.
Step 4: archive name
> myarchive.tar.gz
Finally.
Related videos on Youtube
user1118764
Updated on May 11, 2020Comments
-
user1118764 about 4 years
I normally compress using
tar zcvf
and decompress usingtar zxvf
(using gzip due to habit).I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression.
Is there any way I can utilize the unused cores to make it faster?
-
Warren Severin over 6 yearsThe solution proposed by Xiong Chiamiov above works beautifully. I had just backed up my laptop with .tar.bz2 and it took 132 minutes using only one cpu thread. Then I compiled and installed tar from source: gnu.org/software/tar I included the options mentioned in the configure step: ./configure --with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzip I ran the backup again and it took only 32 minutes. That's better than 4X improvement! I watched the system monitor and it kept all 4 cpus (8 threads) flatlined at 100% the whole time. THAT is the best solution.
-
-
user788171 about 11 yearsHow do you use pigz to decompress in the same fashion? Or does it only work for compression?
-
Mark Adler about 11 yearspigz does use multiple cores for decompression, but only with limited improvement over a single core. The deflate format does not lend itself to parallel decompression. The decompression portion must be done serially. The other cores for pigz decompression are used for reading, writing, and calculating the CRC. When compressing on the other hand, pigz gets close to a factor of n improvement with n cores.
-
Randall Hunt over 10 yearsThis is an awesome little nugget of knowledge and deserves more upvotes. I had no idea this option even existed and I've read the man page a few times over the years.
-
Garrett about 10 yearsThe hyphen here is stdout (see this page).
-
slhsen almost 10 yearsSo as far as I understand files generated by pigz are compatible with gzip right? Can I decompress a file with gzip which had been created with pigz?
-
Mark Adler almost 10 yearsYes. 100% compatible in both directions.
-
CharlesL about 9 yearspigz can use multiple cores for compression, but the tar operation is still using only one core. Is there a parallel tar?
-
Mark Adler about 9 yearsThere is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file with header blocks in between files.
-
Admin about 9 yearsThis is indeed the best answer. I'll definitely rebuild my tar!
-
Admin about 9 years
-
oᴉɹǝɥɔ almost 9 yearsThis is a great and elaborate answer. It may be good to mention that multithreaded compression (e.g. with
pigz
) is only enabled when it reads from the file. Processing STDIN may in fact be slower. -
bovender over 8 years@ValerioSchiavoni: Not here, I get full load on all 4 cores (Ubuntu 15.04 'Vivid').
-
kasur over 8 yearsI have submitted an edit to the answer to indicate the default number of compression thread to be equal to the number of online processors as per official docs but not to the 8 cores as was specified in the answer originally. Thanks.
-
Mark Adler over 8 yearsThe edit seems to have been rejected by someone else, but I will make a similar edit.
-
Lester Cheung over 8 yearsjust drop the -f option of tar if you want stdout. ;-)
-
jmiserez about 8 yearsBeware that redirecting (
>
) will simply overwrite existing files unless you haveset -o noclobber
set. -
selurvedu almost 8 yearsPlus 1 for
xz
option. It the simplest, yet effective approach. -
Offenso over 7 yearsI prefer
tar - dir_to_zip | pv | pigz > tar.file
pv helps me estimate, you can skip it. But still it easier to write and remember. -
einpoklum over 7 yearsA nice TL;DR for @MaximSuslov's answer.
-
William T Froggard about 6 yearsAlso worth noting that since pigz probably is going to be network-bound in most situations unless you make it work hard, increasing the block size can dramatically improve performance. By increasing its block size to 524288 (512MB), I'm seeing numbers as high as 80MB/s over 802.11ac wifi. I believe the transfer is still network-bound, so you may see better results over gigabit ethernet. I sometimes see insane 400MB/s spikes, but those are scary and odd, so I'm not sure what to make of them.
-
Mark Adler about 6 years@WilliamTFroggard The spikes may be due to the burstiness of the deflate algorithm. Uncompressed data is collected until a deflate block can be produced, at which time the block is rapidly generated and emitted.
-
scai over 5 years
export XZ_DEFAULTS="-T 0"
before callingtar
with option-J
for xz compression works like a charm. -
Andre Figueiredo over 5 yearsWouldn't be more performatic to use
-l
instead of STDIN/STDOUT? -
Mark Adler over 5 yearsI wouldn't know, since "performatic" is not a word.
-
Marc.2377 over 4 years@NathanS.Watson-Haigh Yes do you. Just enclose the program name and arguments in quotes.
man tar
says so, as does this. -
jadelord over 4 yearsIn 2020,
zstd
is the fastest tool to do this. Noticeable speedup while compressing and decompressing. Usetar -cf --use-compress-program=zstdmt
to do so with multi-threading. -
Arash about 4 yearsThis returns
tar: home/cc/ziptest: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors
` -
Arik almost 4 yearsThis is actually faster than
tar -c --use-compress-program=pigz
-
ruario over 3 yearsThis answer looks like it was largely lifted directly from my LQ post. A link back might have ben nice.