Which is more efficient - tar or zip compression? What is the difference between tar and zip?
Solution 1
tar
only makes a single file out of multiple files, it doesn't do compression unless combined a compression program such as gzip
or bzip2
(which you can call from within tar
by using the -z
or -j
options, respectively). zip
combines both the archiving and compression in one program.
Solution 2
tar
- Assumes you'll be reading from one end to the other - "Tape ARchive". (The age of the command shows...)
- Does not do compression, but you can compress the entire resulting stream by piping it through e.g. gzip and bzip2 (done internally with -z or -j)
- Stores unix file attributes: uid, gid, permissions (most notably executable). The default may depend on your distribution, and can be toggled with options.
zip
- Stores MSDOS attributes. (Archive, Readonly, Hidden, System)
- Compresses each file, then adds them to an archive
- Includes a file table at the end of the file
- and as a result of the former two, allows reading only the exact parts about the file you need.
The fact that zip compresses the files separately will impact compression ratios, particularly on many small similar files.
(At least this was exactly correct a decade ago.)
Solution 3
Tar preserves much more metadata than Zip, see my comparison (it's slightly outdated):
(Click to zoom in)
Tar passes 65% of the tests, where Zip only passes 17%. I have made the test suite available on github under BSD license so you can try for yourself if you have Mac. For linux there I'm not sure if there are any metadata, so these tests may not be relevant.
Solution 4
Efficiency can be measured in different ways:
- How long does the process take?
- How large are the resulting files?
There are other questions, too, like "How common are the tools to manipulate the resulting archives?"
So, for example, bzip2
creates smaller files than gzip
, but it can take significantly longer. Also, in my experience gzip
is universal on Unix-like systems, but bzip2
is still not (though it's very common and usually easy to get).
Solution 5
As Wim noted, tar itself doesn't compress. If you do add compress the tar (e.g. to get a .tar.gz or .tar.bz2), you're compressing the whole tar file at once. In contrast, zip compresses each file individually.
The efficiency depends on the workload. Specifically, zip allows you to access individual files directly. With tar, you have to first seek through the unwanted (compressed) files before. The compression performance depends on what you're compressing. tar
with bzip2
is often better for a large number of similar files (e.g. a source directory). zip
could be better if each file has very different content.
Related videos on Youtube
Comments
-
rekha_sri over 1 year
I'm working in Linux environment and want to know about tar and zip commands.
Which is more efficient - tar or zip? I also need to know the differences between the tar and zip commands. Can anyone explain them to me?
-
akira almost 14 years... on the other hand, you have to get the whole zip file before you can access the content, because the toc is placed at the end. in contrast, you can untar a tar as fast as the bytes arrive...
-
David Spillett almost 14 years7zip (7-zip.org) is another good option for getting excellent compression at the expense of CPU time. Less common than bzip2 (not installed by default anywhere that I know of) but easy to install in most places (it is in the standard repositories for most Linux distributions and there is a simple installer package for Windows. Like tar+gzip it carries the compression window across input files so gets even greater savings over zip when including many small files.
-
neoneye almost 14 yearsEfficiency can also be measure by how well it preserves the data, see my answer to this question. Tar is much better than zip at preserving the data.
-
Rich Homolka almost 14 yearsone more measurement coud be compatibility outside of UNIX. Windows is fine with zip (built in to Windows), can usually easily process tar.gz with shareware, but bzip2 is rare to find. Unfortunately Original Question didn't mention these criteria, so can't see if they're relevant.
-
Wim over 13 yearsI once did a thorough review of compression ratio versus time required for some common compressors, and which would be the most efficient depending on how you value space versus time: blog.grandtrunk.net/2004/07/practical-compressor-test
-
CppLearner over 11 yearsInteresting! +1 for this. But then again, that was a huge program. Did you write this for other purpose? Just curious.
-
neoneye about 11 yearsI wrote the tests for a file manager that I was working on some years ago. Never released it though.
-
Taylor Ramirez over 7 yearsLinux has metadata as well, so should work for it.
-
Alaa almost 3 yearsZIP stores MS-DOS attributes and Unix attributes: The Unix
zip
andunzip
utilities always store and restore Unix file permissions,unzip
restores the Unix file timestamps unless you provide the-DD
option, andunzip
even restores the UID and GID if you provide the-X
option. -
Alaa almost 3 yearsThe Unix
zip
andunzip
utilities always store and restore contents and file permissions, by default store and restore file timestamps, and can store and restore file ownership (UID and GID). This is often all you need.