How to obtain maximum compression with .tar.gz?
Solution 1
Or, you can tell tar to user maximum compression this way:
export GZIP=-9
tar cvzf file.tar.gz /path/to/directory
Additionally, to keep your envvars clutter-free, you can do this:
env GZIP=-9 tar cvzf file.tar.gz /path/to/directory
Solution 2
As you stated- "tar can also compress", implies that - tar
does not always compress data by itself. It does so only when used with the z
option. That too not by itself, but by passing the tarred data through gzip.
However instead, as noted in this answer, you can pipe the two commands: tar
& gzip
such that you can explicitly specify compression level for the gzip
command to achieve the smallest output size.
tar cvf - /path/to/directory | gzip -9 - > file.tar.gz
Here 9
specifies maximum possible compression level.
Solution 3
Usually neither gzip nor tar can create "the absolute smallest tar.gz". There are many compression utilities that can compress to the gz format. I have written a bash script "gz99" to try gzip
, 7z
and advdef
to get the smallest file. To use this to create the smallest possible file run:
tar c path/to/data | gz99 file.gz
The advdef
utility from AdvanceCOMP usually gives the smallest file, but is also buggy (the gz99
utility checks that it hasn't corrupted the file before accepting the output of advdef
). To use advdef
directly, create file.tar.gz however you feel like. Then run:
advdef -z -4 file.tar.gz
This will create a standard gz file that can be read by gzip and tar as normal, just a tiny bit smaller. This is about the best you can do with the gz format.
Since you only recently learnt that tar can compress, and didn't say why you wanted the the smallest ".tar.gz" file, you may be unaware that there are more efficient formats can be used with tar files, such as xz. Generally, switching to a different format can give a vastly better improvement in compression than fiddling round with gzip options. The main disadvantage of xz is that it isn't as common as gzip so the people you send the file to might have to install a new package. It also tends to be a bit slower, particularly when compressing. If this doesn't matter to you, and you really want the smallest tar file, try:
tar cv path/to/data | xz -9 > file.tar.xz
Modern versions of tar, for example on Ubuntu 13.10, automatically detect compressed files. So even if you use xz compression you can still decompress as usual:
tar xvf file.tar.xz
To give a quick idea how these compression utilities compare, consider the effect of compressing patch-3.1.1 from the linux kernel:
utility cpu format size(bytes)
gzip -9 0.02s gz 105,628
advdef -2 0.07s gz 102,619
7z -mx=9 -tgzip 0.42s gz 102,297
advdef -3 0.55s gz 102,290
advdef -4 0.75s gz 101,956
xz -9 0.03s xz 91,064
xz -3e 0.15s xz 90,996
In this trivial example, we see that to get the smallest gz we need advdef (though 7z -tgzip is almost as good and a lot less buggy). We also see that switching to xz gains us much more space than trying to squeeze the most out of the old gz format, without compression taking too long.
Solution 4
tar c /path/to/data | gzip --best > file.tar.gz
gzip
option --best
(equivalent to -9
) asks for the highest compression level.
Related videos on Youtube
Mario Zigliotto
Updated on September 18, 2022Comments
-
Mario Zigliotto over 1 year
The way i understand the use of tar + gzip is that
tar
is normally used to consolidate a grouping of files into a single file, thengzip
is used to compress that file.I recently learned that
tar
can also compress.Because I do not fully understand how compression works @ it's core, I have (possibly ridiculous) concerns that sending a pre-compressed .tar to gzip might prevent gzip from compressing as well as it's potential would allow and things of that nature.
My question is essentially: What combination of args/compression methods should i use to create the absolute smallest tar.gz, and what does the command line statement look like for that?
-
Keltari over 11 yearsCompressing already compressed files may reduce their size, or it may make the archive bigger. It all depends on the type of data and any compression being used.
-
Ravindra Bawane over 9 yearsWhat @Keltari said. Compression rates and ratios are highly dependent on what it is you are compressing, which is also why there are different compression algorithms and methods.
-
-
user5928703 about 10 yearsAlternatively, use
--best
flag: -9 is confusing to reader. -
ChrisInEdmonton about 10 yearsThe OP asked for how to get the most compression for a .tar.gz file, but you suggested creating a .tar.xz file. You are answering a different question than asked.
-
ChrisInEdmonton about 10 yearsAh, I see what you are going for. advdef just crashes on my system (v1.15), so 'advdef -z -4 file.tar.gz' doesn't work, but it at least theoretically could. I can't find evidence that it would shrink the file further than 'gzip -9', but it might, and in any case is enough for me to remove my -1 vote. Thanks for clarifying!
-
gmatht about 10 yearsHmm, I'm using v1.17. Anyway the pedantic mathematician in me wants to point out that my answer arguably isn't technically correct. After all, if you enumerate all possible gz files from shortest to longest and pick the first one that decompresses to the right file, you could shave yet a few more bytes off. But that'd be way too slow in practice.
-
Xen2050 about 7 yearsI don't think "buggy" and "archive" should ever be used together, what use is an archive that's corrupt? You need a much larger file to "compare" the compression utilities, and different types of input files too - measuring in hundredths of a second differences isn't that reliable, I think
xz -9
usually takes something like 5x thegz -9
time, not just 1.5x as your table suggests. -
nyxee over 6 yearshow can we create split archives (while compressing) using the xz process please
-
Brian Thomas over 6 yearsI had an issue where its not recursive, and complains that it will be an empty archive, since the command is split, its hard to find how to properly force recursive, since its already tar default. MY BAD, I had incorrectly specified it starting like this
tar -cvf /path
-
BallpointBen about 4 years
tar -z
will create a compressed.tar.gz
archive but this is not the only compression method supported by (all versions of)tar
. For instance,tar -j
will create atar.bz2
, and there are some other compression methods supported as well. -
Suuuehgi almost 4 years
gzip: warning: GZIP environment variable is deprecated; use an alias or script
tar: Exiting with failure status due to previous errors
-
stj almost 4 yearsTo work around this warning, use
tar cvf file.tar.gz /path/to/directory -I "gzip --best"
.-I
specifies the compression program and options.