Zip / 7zip Compression Differences

10,856
  • Both .zip and .7z are lossless compression formats. .7z is newer and is likely to give you a better compression ratio, but it's not as widely supported as .zip, and I think it's somewhat more computationally expensive to compress/decompress.

  • The how much better is dependent on the types of files you are compressing but according to the wikipedia article on 7zip

In 2011, TopTenReviews found that the 7z compression was at least 17% better than ZIP,[15] and 7-Zip's own site has since 2002 reported that while compression ratio results are very dependent upon the data used for the tests, "Usually, 7-Zip compresses to 7z format 30–70% better than to zip format, and 7-Zip compresses to zip format 2–10% better than most other zip-compatible programs."[16]

Share:
10,856
Colen
Author by

Colen

Win32 software developer.

Updated on July 26, 2022

Comments

  • Colen
    Colen almost 2 years

    I have a number of zip files that I need to distribute to users, around 130 of them. Each zip file contains a number of similar text, html, xml, and jpg files. In total, the zip files total 146 megabytes; unzipped, their contents total 551mb.

    I want to distribute all these files together to users in as small a format as possible. I looked into two different ways of doing it, each using two different compression schemes, zip and 7zip (which I understand is either LZMA or a variant thereof):

    1. Compress all the zip files into a compressed file and send that file (single.zip/7z)
    2. Compress the unzipped contents of the zip files into a compressed file and send that file (combined.zip/7z)

    For example, say that I have 3 zip files, A.zip, B.zip and C.zip, each of which contains one text file, one html file, and one XML file. With method 1, a single compressed file would be created containing A.zip, B.zip and C.zip. With method 2, a single compressed file would be created containing A.txt, A.html, A.xml, B.txt, B.html, B.xml, C.txt, C.html, and C.xml.

    My assumption was that under either compression scheme, the file generated by method 2 would be smaller or at least the same size as the file generated by method 1, as you might be able to exploit efficiencies by considering all the files together. At the very least, method 2 would avoid the overhead of multiple zip files.

    The surprising results (the sizes of files generated by the 7zip tool) were as follows:

    1. single.zip - 142mb
    2. single.7z - 124mb
    3. combined.zip - 149mb
    4. combined.7z - 38mb

    I'm not surprised that the 7zip format produced smaller files than the zip format (result 2/4 vs result 1/3), as it generally compresses better than zip. What was surprising was that for the zip format, compressing all 130 zip files together resulted in a smaller output file than compressing all their uncompressed contents (result 3 vs result 1).

    Why is it more efficient to zip several zip files together, than to zip their unzipped contents together?

    The only thing I can think of is that during compression, the 7zip format builds a dictionary across all the file contents, so it can exploit similarities between files, while the zip format builds the dictionary per-file. Is that true? And even that still doesn't explain why result 3 was 7mb larger than result 1.

    Thanks for your help.