How to XZ a directory with TAR using maximum compression?

150,721

Solution 1

Assuming xz honors the standard set of commandline flags - including compression level flags, you could try:

tar -cf - foo/ | xz -9 -c - > foo.tar.xz 

Solution 2

With a recent GNU tar on bash or derived shell:

XZ_OPT=-9 tar cJf tarfile.tar.xz directory

tar's lowercase j switch uses bzip, uppercase J switch uses xz.

The XZ_OPT environment variable lets you set xz options that cannot be passed via calling applications such as tar.

This is now maximal.

See man xz for other options you can set (-e/--extreme might give you some additional compression benefit for some datasets).

XZ_OPT=-e9 tar cJf tarfile.tar.xz directory

Solution 3

XZ_OPT=-9e tar cJf tarfile.tar.xz directory

is even better than

XZ_OPT=-9 tar cJf tarfile.tar.xz directory

Solution 4

If you have 16 GiB of RAM (and nothing else running), you can try:

tar -cf - foo/ | xz --lzma2=dict=1536Mi,nice=273 -c - > foo.tar.xz 

This will need 1.5 GiB for decompression, and about 11x that for compression. Adjust accordingly for lesser amounts of memory.

This will only help if the data is actually that big, and in any case it won't help THAT much, but still...

If you're compressing binaries, add --x86 as the first xz option. If you're playing with "multimedia" files (uncompressed audio or bitmaps), you can try with --delta=dist=2 (experiment with value, good values to try are 1..4).

If you're feeling very adventurous, you can try playing with more LZMA options, like

--lzma2=dict=1536Mi,nice=273,lc=3,lp=0,pb=2

(these are the default settings, you can try values between 0 and 4, and lc+lp must not exceed 4)

In order to see how the default presets map to these values, you can check the source file src/liblzma/lzma/lzma_encoder_presets.c. Nothing of much interest there though (-e sets the nice length to 273 and also adjusts the depth).

Solution 5

tar --help : -I, --use-compress-program=PROG

tar -I 'xz -9' -cvf foo.tar.xz foo/  
tar -I 'gzip -9' -cvf foo.tar.gz foo/    

also compress with external compressors:

tar -I 'lz4 -9' -cvf foo.tar.lz4 foo/
tar -I 'zstd -19' -cvf foo.tar.zst foo/

decompress external compressors:

tar -I lz4 -xvf foo.tar.lz4  
tar -I zstd -xvf foo.tar.zst  

list archive external compressors:

tar -I lz4 -tvf foo.tar.lz4
tar -I zstd -tvf foo.tar.zst
Share:
150,721

Related videos on Youtube

LanceBaynes
Author by

LanceBaynes

Updated on September 18, 2022

Comments

  • LanceBaynes
    LanceBaynes over 1 year

    So I need to compress a directory with max compression.

    How can I do it with xz? I mean I will need tar too because I can't compress a directory with only xz. Is there a oneliner to produce e.g. foo.tar.xz?

    • Admin
      Admin over 9 years
      FWIW, man 1 xz says it's not a good idea to blindly use -9 for everything like it often is with gzip(1) and bzip2(1). -7 ... -9 [...] These are useful only when compressing files bigger than 8 MiB, 16 MiB, and 32 MiB, respectively. RTFM for more info.
  • LanceBaynes
    LanceBaynes over 12 years
    and this uses maximum compression level with XZ?
  • LanceBaynes
    LanceBaynes over 12 years
    and this uses maximum compression level with XZ?
  • bsd
    bsd over 12 years
    adding -9 to xz will make it max
  • bsd
    bsd over 12 years
    It does now, see edited answer and XZ_OPT env var ;)
  • Admin
    Admin about 11 years
    Just a note: you have to export XZ_OPT.
  • bsd
    bsd about 11 years
    No, you don't. That's the whole point. You can set the environment var for just that invocation. You can export it if you want to, but you don't have to.
  • anddam
    anddam about 11 years
    You're assuming bash-like shell for that.
  • Anthon
    Anthon over 10 years
    The J was already mentioned in bdowning's answer
  • psusi
    psusi over 9 years
    This really doesn't answer the question. This is just an observation that for your particular small data set, -4e already gets the best compression and so the higher levels don't get any more benefit ( and even an ever so slight penalty ).
  • terdon
    terdon over 9 years
    Are you the same user as Szymon Roziewski? If so, please don't post multiple answers. Instead, edit your original answer. If you can't access your first account, please see here for how to merge your accounts. In the meantime, I am deleting your previous answer and including it here.
  • Szymon Roziewski
    Szymon Roziewski over 9 years
    Ok, I have done a more comprehensive study on that. What I got is here. I chose some files from my hardrive and made compression with option -4e and -9e. So, it's better to find your best solution by yourself. You were right, for some cases -9e is better whereas for another it's not: no difference = 660 4e better than 9e = 74 9e better than 4e = 17 total files = 751 tar 2 html 2 csv 2 xml 2 gz 2 ppt 2 eps 2 docx 2 gif 2 rpm 3 png 3 asv 3 xlsx 3 exe 3 rar 4 nc 4 txt 5 odt 6 xls 7 zip 7 doc 9 m 12 dat 17 other 109 pdf 133 135 jpg 270
  • Szymon Roziewski
    Szymon Roziewski over 9 years
    (comments may be edited only for 5 minutes) txt 109 txt/pdf 135
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    @anddam, that's supported by all shells of the Bourne family (Bourne, ksh, mksh, pdksh, ash, dash, bash, yash, zsh) and rc and akanga. fish, csh, tcsh and es being the major shells that don't support it. There, you'd use the env command.
  • anddam
    anddam over 9 years
    Actually on fish I'd use 'set' command. The point if that if you're using a syntax specific to one shell you'd warn the reader about that.
  • cychoi
    cychoi over 9 years
    +1. This does help the OP find a way to determine maximum compression for taring files using xz.
  • Amedee Van Gasse
    Amedee Van Gasse almost 9 years
    The question was about xz, not about 7z, even though they both use LZMA compression.
  • cxdf
    cxdf almost 9 years
    How is this better? What does the e flag do?
  • Evandro Jr
    Evandro Jr about 8 years
    option -e, --extreme Modify the compression preset (-0 ... -9) so that a little bit better compression ratio can be achieved without increasing memory usage of the compressor or decompressor (exception: compressor memory usage may increase a little with presets -0 ... -2). The downside is that the compression time will increase dramatically (it can easily double).
  • Krzysztof Krasoń
    Krzysztof Krasoń almost 8 years
    -9e is the best level, but it will take very long
  • Rahly
    Rahly about 7 years
    Its good to note, the use of XZ_OPT or XZ_DEFAULTS depends on the version of XZ and not TAR. man xz
  • nyxee
    nyxee over 6 years
    So, If i'm compressing about 80GB of Software on my machine (when i want all the computers resources to go to the compression process for speed) i should use -9 not -9e, yeah?
  • dhag
    dhag over 6 years
    This seems like a working answer, but, as it is, it would be greatly improved by having its formatting fixed and and explanation of option -I added.
  • twistylittlepassages
    twistylittlepassages over 6 years
    Just for the record: XZ_OPT is not a feature implemented in tar. It's a feature of xz. When tar calls xz, the env-variable is simply passed on.
  • EkriirkE
    EkriirkE over 5 years
    xz by default uses 1 core/thread, you can max that out (speed it all up) by adding -T0, eg XZ_OPT="-9e -T0" tar -cJf ...
  • Dzenly
    Dzenly almost 5 years
    Bad variable name choosing, because T0 is option to enable multi-threaded archivation.
  • Jimmy
    Jimmy almost 5 years
    @Dzenly You're right! Thank you! Changed it.
  • KolonUK
    KolonUK almost 5 years
    -9e will not always give you the best result - see point 8 here rootusers.com/13-simple-xz-examples
  • KolonUK
    KolonUK almost 5 years
    Also, you might see significant improvement if you add --threads=0 to xz
  • user3439968
    user3439968 almost 5 years
    XZ_OPT=-e9T0 tar cJf tarfile.tar.xz directory. T0 - Specify the number of worker threads to use. Setting threads to a special value 0 makes xz use as many threads as there are CPU cores on the system.
  • holzkohlengrill
    holzkohlengrill over 4 years
    Is the pipe variant (tar .... | xz ...) (significantly) slower than using -J/-j/...?
  • Vlastimil Burián
    Vlastimil Burián over 4 years
    Please, be aware this thread has been read 104k times to date. Be sure to add something distinctive. So far, I don't see any way this post actually contributes to the overall thread. How is it different from writing a one-liner: xz -k -8e -M 7000MB -T 8 -v whatever.img? It has been already posted here for instance not exacly the same, but better with the XZ_OPT syntax pointed out. Cheers.
  • Adam Wądołkowski
    Adam Wądołkowski over 4 years
    I sharing my experience in this matter with technical aspects. The example is based on the syntax xz (XZ Utils) 5.2.2 (with man xz) as I write above. I think the test gives a broader picture of the use of xz and an example for further tests optimizing the compression rate vs performance vs equipment load. Regards.
  • staticfloat
    staticfloat about 4 years
    @KolonUK reading that article, it shows that -e (extreme mode) always improves compression ratio; the comparison is between -0e and -6; while -e always improves compression ratio within the same compression level, a higher compression level may be more effective than "extreme mode". There is no evidence that -9e can yield a worse compression ratio than -9.
  • cronburg
    cronburg about 3 years
    Double reminder to readers to check man xz | grep XZ_OPT before using this method.
  • midnite
    midnite about 2 years
    @user3439968 - I wonder if using T0 without a space means compression level 0. I think using XZ_OPT="-e9 -T 0" tar cJf tarfile.tar.xz directory is what you mean.