checksums on zip files

16,932

Inside a zip archive, each "file" is stored with meta data like last modifcation time, filename, filesize in bytes, etc...and the important part - a crc32 checksum.

basically, you can just operate on the zip archive in a binary fashion, finding each file's meta data header and comparing the checksum to the previously stored checksums. You don't need to do any uncompressing to access the meta data in a zip archive. This would be extremely fast.

http://en.wikipedia.org/wiki/Zip_(file_format)

edit- actually, ZipArchive offers this functionality. See: http://www.php.net/manual/en/ziparchive.statindex.php

Share:
16,932
Kit Barnes
Author by

Kit Barnes

Updated on June 04, 2022

Comments

  • Kit Barnes
    Kit Barnes almost 2 years

    I am currently working on a tool that uploads a group of files, then uses md5 checksums to compare the files to the last batch that were uploaded and tells you which files have changed.

    For regular files this is working fine but some of the uploaded files are zip archives, which almost always have changed, even when the files inside it are the same.

    Is there a way to perform a different type of checksum to check if these files have changed without having to unzip each one individually and then comparing the contents of each file individually.

    Here is my current function

    function check_if_changed($date, $folder, $filename)
    {
      $dh = opendir('./wp-content/uploads/Base/');
      while (($file = readdir($dh)) !== false) {
        $folders[] = $file;
      }
      sort($folders);
      $position = array_search($date, $folders);
      $prev_folder = $folders[$position - 1];
      if ($prev_folder == '.' || $prev_folder == '..')
        { return true;}
      $newhash = md5_file('./wp-content/uploads/Base/'.$date.'/'.$folder.'/'.$filename);
      $oldhash = md5_file('./wp-content/uploads/Base/'.$prev_folder.'/'.$folder.'/'.$filename);
      if ($oldhash != $newhash){
        return true;
      }
      return false;
    }