Multithreaded xz, with gzip, pv, and pipes - is this the most efficient I can get?
With the -T0
multithread option you tell xz two things at once. To use MT also means: wait until all input (data) is read into memory, and then start to compress in "parallel".
After including pigz
into my tests I analyze the perfomance step by step; I have a 100M file f100
.
$ time xz -c f100 >/dev/null
real 0m2.658s
user 0m2.573s
sys 0m0.083s
99% of the time is spent compressing on one core. With all four cores activated with -T4
(or -T0
)
$ time xz -c -T4 f100 >/dev/null
real 0m0.825s
user 0m2.714s
sys 0m0.284s
Overall result: 300% faster, almost linear per core. The "user" value must be divided by 4, the way it is reported/defined. "sys" now shows some overhead -- real is the sum of 1/4 user plus sys.
$ time gzip -dc f100.gz >/dev/null
$ time pigz -p4 -dc f100.gz >/dev/null
This is 0.5 vs. 0.2 seconds; when I put all together:
$ time pigz -dc -p4 f100.gz | xz -c -T4 >out.xz
real 0m0.902s
user 0m3.237s
sys 0m0.363s
...it reduces 0.8 + 0.2 = 0.9.
With multiple files, but not too multiple, you can get highest overall parallelism with 4 shell background processes. Here I use four 25M files instead:
for f in f25-?.gz; do time pigz -p4 -dc "$f" | xz -c -T0 >"$f".xz & done
This seems even slightly faster with 0.7s. And even without multithreading, even for xz
:
for f in f25-?.gz; do time gzip -dc "$f" | xz -c >"$f".xz & done
just by setting up four simple quarter pipelines with &
, you get 0.8s, same as for a 100M file with xz -T4
.
In my scenario it is about just as important to activate multithreading in xz
than it is to parallelize the whole pipeline; if you can combine this with pigz and/or multiple files, you can even be a bit faster than a quarter of the sum of the single steps.
Related videos on Youtube
Comments
-
tu-Reinstate Monica-dor duh about 1 year
I'm excited to learn that
xz
now supports multithreading:xz --threads=0
But now I want to utilise this as much as possible. For example, to recompress gzips as xz:
gzip -d -k -c myfile.gz | pv | xz -z --threads=0 - > myfile.xz
This results in my processor being more highly used (~260% CPU to xz, yay!).
However:
- I realise that
gzip
is not (yet) multithreading, - I think that either
pv
or the pipes may be restricting the number of (IO?) threads.
Is this true and, if so, is there a way to make this more efficient (other than to remove
pv
)?-
Admin about 4 yearsCan you give more details about the scenario? In my answer I point out that the number of .gz files matters a lot.
-
tu-Reinstate Monica-dor duh about 4 yearsIn my case it's actually one large disk image being recompressed. I came across this because of the
gzip
32-bit size reference limit and I wanted my compressed files to show the right uncompressed size. There was a significant improvement in recompression, though, by about 40% compared to without threads, mostly during the nulled part of the uncompressed image where it reached 100MB/s (over USB3.0) according topv
. I put this down to lower IO wait times on thegzip
end due to more "waiting" threads on thexz
end but I wondered whether the pipe andpv
were a bottleneck. -
Admin about 4 yearsAnd did you
time
the difference at all between -T0 and without? Acoording to my trials it does not make a difference, you loose as much as you win. -
tu-Reinstate Monica-dor duh about 4 yearsYes, that's how I got the ~40%, but I didn't keep a copy, sorry.
-
Admin about 4 yearsI got over 300%. So this shows that input (pipe) is the big bottleneck, not xz. It is in between my 300% and the 0.1% you get when you slow down the dd pipe with count=10 (only 10 bytes per read).
-
Admin about 4 yearsI did my tests with a 100MB file on a ramdisk. Your Q is very interseting, so it needs precision, and some testing (timing).
-
Alessio about 4 yearshave you tried
pigz -d
instead ofgzip -d
? that might improve performance a little.pigz -d
can't decompress in parallel - however, it can run 4 threads at a time (one each for reading, writing, checksum calcs, and decompression). seeman pigz
for details. If it's not packaged for your distribution, you can find pigz at zlib.net/pigz - in debian etc,sudo apt-get install pigz
-
Admin about 4 years@cas: "pigz...which can speed up decompression under some circumstances." This must be multiple files on a good system. "specially prepared deflate streams " seem to be the workaround.
-
Admin about 4 years@cas I injstalled pigz and tested - this is really a faster decompression, even though the algorithm itself is on one core, as you explain. Overall decompression is only about 1/4 of the work, so pigz here is only secondary. Thanks to your hint I made some tests, see my answer.
- I realise that