How to quickly check if a zip file is corrupted?

44,452

Solution 1

Section 4.3.7 of this page says that the compressed size is 4 bytes starting from byte 18. You could try reading that and comparing it to the size to the file.

However, I think it's pretty much useless for checking if the zip file is corrupted for two reasons:

  1. Some zip files contain more bytes than just the zip part. For example, self-extracting archives have an executable part yet they're still valid zip.
  2. The file can be corrupted without changing its size.

So, I suggest calculating the CRC for a guaranteed method of checking for corruption.

Solution 2

Use zip -T to test the the file corrupted or not. Sample corrupted file look like this:

 zip -T filename.zip
        zip warning: missing end signature--probably not a zip file (did you
        zip warning: remember to use binary mode when you transferred it?)
        zip warning: (if you are trying to read a damaged archive try -F)

zip error: Zip file structure invalid (filename.zip)

Solution 3

DotNetZip, a free open source library for handling zip files in .NET languages, supports a CheckZip() method that does what you want. There are various levels of assurance available at your option. The basic level just checks consistency of metadata. The most complete level does a full extraction of the zip file into a bitbucket to verify that the actual compressed data is not corrupted.

Solution 4

To check the whole archive 'for sure' you need to extract all data (since CRC, stored in archive, is calculated over uncompressed data), and, even after that you cannot be sure for 100% that it is not corrupted (because CRC is good, but not-guarantee that data was not altered).

Share:
44,452

Related videos on Youtube

thuantta
Author by

thuantta

Updated on July 09, 2022

Comments

  • thuantta
    thuantta almost 2 years

    Does anyone have any ideas for how to pragmatically quickly check if a zip file is corrupted based on file size? Ideally the best way to check if a zip is corrupted is to do a CRC check but this can take a long time especially if there is a lot of large zip files. I would be happy just to be able to do a quick file size or header check.

    Thanks in advance.

  • SimonJ
    SimonJ over 13 years
    Also, many zip creation tools will write the header before they know the length of the file, so these bytes remain zero (to support streaming, presumably).
  • Cheeso
    Cheeso over 13 years
    What @SimonJ said is true, but also - the compressed size starting from byte 18 is the compressed size of a single entry in the zip file. It is not the compressed size of the zip file.
  • Cheeso
    Cheeso over 13 years
    Also, this may be obvious, but worth stating: "calculating the CRC" works to verify the file, only if the original CRC is known.
  • HackSlash
    HackSlash about 4 years
    CodePlex is dead and those pages are now "Archive".
  • HackSlash
    HackSlash about 4 years
    This might be the same code? github.com/DinoChiesa/DotNetZip
  • geotheory
    geotheory over 3 years
    Very handy. Can also be used to distinguish between e.g. doc and docx files where the file extension isn't reliable.