Which algorithm is used in standard ZIP?

15,506

Solution 1

Zip provides capabilities roughly equivalent to the combination of tar with gzip.

tar just collects a number of files together into a single file, preserving information about the original files (e.g., paths, dates). Contrary to the statement in the question, it does no compression by itself.

gzip just takes a single file and compresses it.

Zip does both of those -- i.e., it stores a number of constituent files into an archive (again, preserving things like paths, dates, etc.), and compresses them. Unlike tar + gzip, it compresses each file individually, and leaves the "directory" information about the constituent files un-compressed. This makes it easy to work with individual files in the archive (insert, delete, decompress, etc.) but also means that it usually won't get as good of compression overall.

Rather than re-implementing zip's compression algorithm, you're almost certainly better off downloading the code (extremely portable, very liberal license) from the zlib web site. The zlib web site does have a fairly reasonable explanation of the algorithms. If you really insist on doing this yourself, you probably also want to look at RFC 1950, 1951, and 1952.

Solution 2

"zip" in this context is a file format that permits several different compression methods. They include deflate, deflate64, bzip2, lzma, wavpack, and ppmd. In practice however, you will almost always see deflate used exclusively in zip files, for compatibility.

deflate is also the compression method used in gzip and by zlib, as well as by the png image format.

deflate is an LZ77 compressor, not LZ78.

tar is an archiver, not a compressor. It produces the .tar file format. The .tar file is usually compressed (conveniently by the tar program itself calling external programs) which adds a suffix, e.g. .tar.gz for gzip compression. tar options include -z for gzip, -j for bzip2 (.bz2), and -J for lzma (.xz).

You do not need to implement the algorithm for deflate. It has been done for you. You can use zlib in your code, which has a very liberal license.

Share:
15,506
Admin
Author by

Admin

Updated on June 14, 2022

Comments

  • Admin
    Admin almost 2 years

    I have googled, wikied and read the RFC of ZIP, but can't find any info about the exact algorithm which is used in ZIP.

    I have found info about ZIP == TAR + GZIP

    But, I'm confused by this info.

    Since GZIP uses LZW algorithm as I remember, and TAR uses LZMA, I can't imagine how it could be that ZIP == TAR + GZIP (LZMA + LZW - ???)

    Could you help me with finding the algorithm of ZIP? I want to implement it.

  • fb55
    fb55 about 12 years
    That's also what Wikipedia says.
  • Hot Licks
    Hot Licks about 12 years
    Note that zlib only implements the compression/decompression, not the archiving mechanism.
  • Jerry Coffin
    Jerry Coffin about 12 years
    @HotLicks: Right -- if you want code for the archiving part, that's at the Info-zip web site.