why should I use tar.xz instead of tar.gz? xz is a lossless data compression program and file format

22,466

Solution 1

gzip and xz uses two different algorithms, and therefore they perform differently, both in terms of what level of compression they achieve and in terms of the amount of resources they consume while compressing or decompressing.

In general, xz achieves higher compression ratios, but needs a lot more memory and time.

I personally use xz for archiving data; big files that I need to put away for a long time. I use gzip otherwise, since it's usually quicker.

Do test them both and see how they perform on your average tar (or whatever) file.

Solution 2

xz has become a bit of a standard when it comes to package management in recent years.

the LZMA2 compression algorithm xz uses, is exceptionally efficient with text (and "text-like") data. I personally use it for pretty much anything, where computational encoding speed is not an issue (xz decompresses much faster than compressing, it's actually one of its design targets).

So it's great for things like archiving, or backing up (generally speaking, files that get decompressed a lot, but where data isn't compressed all the time, or changed, etc.). It's not such a great decision for stream compression - at least not all that often, because LZMA2 isn't exactly light on resources, when compressing.

But I'd like to comment on one other statement of yours: "should". What you should or should not use, depends on what you need your data for. Even though xz and LZMA2 has been around for years, people still consider compressing with gz as fall-back for compatibility reasons. Things like Trac or cgit offer .tar.gz, .7z, and .tar.xz. With the .tar.gz option being there, because it has been "around forever", and it's a good fall-back measure on any sort of device, no matter how minimal or old.

Just as a counter example, lzop is a fast and resource saving compressor, which is often used on spacecraft, but it otherwise not very efficient. I couldn't say it's not widely used, though. It's often used on systems where it's more important to quickly compress something with little power and computing space.

So, what you should or should not use, is what makes the most sense to you and the people interacting with your data.


One thing that popped into my mind recently: man pages are routinely compressed with gzip. If you look into /usr/share/man/man1/, for instance, you'll see a lot of files similar to mv.1.gz, if not all of them are compressed. Now, it would arguably make sense to switch to .xz files because of the better compression, but the space gains are minimal, compared to the negative effects from ditching .gz for .xz. The reason for that is the almost universal compatibility. Even opening the .gz compressed files directly in Vim works fine, and acts as transparent compression.

Solution 3

The .xz format provides a couple of advanced features. At the same time, it has been kept simple enough to be usable in many embedded systems. Here is a summary of the features:

  • Random-access reading: The data can be split into independently compressed blocks. Every .xz file contains an index of the blocks, which makes limited random-access reading possible when the block size is small enough.
  • Integrity checks: Integrity of all headers is always protected with CRC32. The integrity of the actual data may be verified with CRC32, CRC64, SHA-256 ...
  • Concatenation: Just like with .gz and .bz2 files, it is possible to concatenate .xz files as is. The decompressor can decompress a concatenated file as if it was a regular single-stream .xz file.
  • Padding: Binary zeros may be appended to .xz files to pad them to fill e.g. a block on a backup tape. The padding needs to be multiple of four bytes, because the size of every valid .xz file is a multiple of four bytes.
Share:
22,466

Related videos on Youtube

Sybil
Author by

Sybil

Updated on September 18, 2022

Comments

  • Sybil
    Sybil over 1 year

    Today first time in my life I saw tar.xz download. I searched the internet and found Wikipedia articles (xz and XZ Utils)

    Interesting quote about the users of xz

    xz has gained notability for compressing packages in the GNU coreutils project,[7] Debian family of systems deb (file format), openSUSE,[8] Fedora,[9] Arch Linux,[10] Slackware,[11] FreeBSD,[12] Gentoo,[13] GNOME,[14] and TeX Live,[15] as well as being an option to compress a compiled Linux kernel.[16] In March 2013, kernel.org announced the use of xz as the default compressed file format for distributing kernel archive files.[17]

    I always use tar.gz. When and why should I use tar.xz? What's the use case?

    I found out after first comment that a similar question already posted. I often compress mongodump/mongoexport (BSON/JSON) and mysqldump (SQL text). Is there an advantage to use tar.xz for those backups?

    • Admin
      Admin over 7 years
      I think you have to set up a simple test and see what compression program is "best" for you. Nobody else can tell you.