Which is more efficient - tar or zip compression? What is the difference between tar and zip?

124,547

Solution 1

tar only makes a single file out of multiple files, it doesn't do compression unless combined a compression program such as gzip or bzip2 (which you can call from within tar by using the -z or -j options, respectively). zip combines both the archiving and compression in one program.

Solution 2

tar

  • Assumes you'll be reading from one end to the other - "Tape ARchive". (The age of the command shows...)
  • Does not do compression, but you can compress the entire resulting stream by piping it through e.g. gzip and bzip2 (done internally with -z or -j)
  • Stores unix file attributes: uid, gid, permissions (most notably executable). The default may depend on your distribution, and can be toggled with options.

zip

  • Stores MSDOS attributes. (Archive, Readonly, Hidden, System)
  • Compresses each file, then adds them to an archive
  • Includes a file table at the end of the file
  • and as a result of the former two, allows reading only the exact parts about the file you need.

The fact that zip compresses the files separately will impact compression ratios, particularly on many small similar files.

(At least this was exactly correct a decade ago.)

Solution 3

Tar preserves much more metadata than Zip, see my comparison (it's slightly outdated):

enter image description here

(Click to zoom in)

Tar passes 65% of the tests, where Zip only passes 17%. I have made the test suite available on github under BSD license so you can try for yourself if you have Mac. For linux there I'm not sure if there are any metadata, so these tests may not be relevant.

Solution 4

Efficiency can be measured in different ways:

  1. How long does the process take?
  2. How large are the resulting files?

There are other questions, too, like "How common are the tools to manipulate the resulting archives?"

So, for example, bzip2 creates smaller files than gzip, but it can take significantly longer. Also, in my experience gzip is universal on Unix-like systems, but bzip2 is still not (though it's very common and usually easy to get).

Solution 5

As Wim noted, tar itself doesn't compress. If you do add compress the tar (e.g. to get a .tar.gz or .tar.bz2), you're compressing the whole tar file at once. In contrast, zip compresses each file individually.

The efficiency depends on the workload. Specifically, zip allows you to access individual files directly. With tar, you have to first seek through the unwanted (compressed) files before. The compression performance depends on what you're compressing. tar with bzip2 is often better for a large number of similar files (e.g. a source directory). zip could be better if each file has very different content.

Share:
124,547

Related videos on Youtube

rekha_sri
Author by

rekha_sri

nil

Updated on September 17, 2022

Comments

  • rekha_sri
    rekha_sri over 1 year

    I'm working in Linux environment and want to know about tar and zip commands.

    Which is more efficient - tar or zip? I also need to know the differences between the tar and zip commands. Can anyone explain them to me?

  • akira
    akira almost 14 years
    ... on the other hand, you have to get the whole zip file before you can access the content, because the toc is placed at the end. in contrast, you can untar a tar as fast as the bytes arrive...
  • David Spillett
    David Spillett almost 14 years
    7zip (7-zip.org) is another good option for getting excellent compression at the expense of CPU time. Less common than bzip2 (not installed by default anywhere that I know of) but easy to install in most places (it is in the standard repositories for most Linux distributions and there is a simple installer package for Windows. Like tar+gzip it carries the compression window across input files so gets even greater savings over zip when including many small files.
  • neoneye
    neoneye almost 14 years
    Efficiency can also be measure by how well it preserves the data, see my answer to this question. Tar is much better than zip at preserving the data.
  • Rich Homolka
    Rich Homolka almost 14 years
    one more measurement coud be compatibility outside of UNIX. Windows is fine with zip (built in to Windows), can usually easily process tar.gz with shareware, but bzip2 is rare to find. Unfortunately Original Question didn't mention these criteria, so can't see if they're relevant.
  • Wim
    Wim over 13 years
    I once did a thorough review of compression ratio versus time required for some common compressors, and which would be the most efficient depending on how you value space versus time: blog.grandtrunk.net/2004/07/practical-compressor-test
  • CppLearner
    CppLearner over 11 years
    Interesting! +1 for this. But then again, that was a huge program. Did you write this for other purpose? Just curious.
  • neoneye
    neoneye about 11 years
    I wrote the tests for a file manager that I was working on some years ago. Never released it though.
  • Taylor Ramirez
    Taylor Ramirez over 7 years
    Linux has metadata as well, so should work for it.
  • Alaa
    Alaa almost 3 years
    ZIP stores MS-DOS attributes and Unix attributes: The Unix zip and unzip utilities always store and restore Unix file permissions, unzip restores the Unix file timestamps unless you provide the -DD option, and unzip even restores the UID and GID if you provide the -X option.
  • Alaa
    Alaa almost 3 years
    The Unix zip and unzip utilities always store and restore contents and file permissions, by default store and restore file timestamps, and can store and restore file ownership (UID and GID). This is often all you need.