compute crc of file in python

46,125

Solution 1

A little more compact and optimized code

def crc(fileName):
    prev = 0
    for eachLine in open(fileName,"rb"):
        prev = zlib.crc32(eachLine, prev)
    return "%X"%(prev & 0xFFFFFFFF)

PS2: Old PS is deprecated - therefore deleted -, because of the suggestion in the comment. Thank you. I don't get, how I missed this, but it was really good.

Solution 2

A modified version of kobor42's answer, with performance improved by a factor 2-3 by reading fixed size chunks instead of "lines":

import zlib

def crc32(fileName):
    with open(fileName, 'rb') as fh:
        hash = 0
        while True:
            s = fh.read(65536)
            if not s:
                break
            hash = zlib.crc32(s, hash)
        return "%08X" % (hash & 0xFFFFFFFF)

Also includes leading zeroes in the returned string.

Solution 3

hashlib-compatible interface for CRC-32 support:

import zlib

class crc32(object):
    name = 'crc32'
    digest_size = 4
    block_size = 1

    def __init__(self, arg=''):
        self.__digest = 0
        self.update(arg)

    def copy(self):
        copy = super(self.__class__, self).__new__(self.__class__)
        copy.__digest = self.__digest
        return copy

    def digest(self):
        return self.__digest

    def hexdigest(self):
        return '{:08x}'.format(self.__digest)

    def update(self, arg):
        self.__digest = zlib.crc32(arg, self.__digest) & 0xffffffff

# Now you can define hashlib.crc32 = crc32
import hashlib
hashlib.crc32 = crc32

# Python > 2.7: hashlib.algorithms += ('crc32',)
# Python > 3.2: hashlib.algorithms_available.add('crc32')

Solution 4

To show any integer's lowest 32 bits as 8 hexadecimal digits, without sign, you can "mask" the value by bit-and'ing it with a mask made of 32 bits all at value 1, then apply formatting. I.e.:

>>> x = -1767935985
>>> format(x & 0xFFFFFFFF, '08x')
'969f700f'

It's quite irrelevant whether the integer you are thus formatting comes from zlib.crc32 or any other computation whatsoever.

Solution 5

Python 3.8+ (using the walrus operator):

import zlib

def crc32(filename, chunksize=65536):
    """Compute the CRC-32 checksum of the contents of the given filename"""
    with open(filename, "rb") as f:
        checksum = 0
        while (chunk := f.read(chunksize)) :
            checksum = zlib.crc32(chunk, checksum)
        return checksum

chunksize is how many bytes at a time you read the file. It doesn't matter what you set it to, you will get the same hash for the same file (setting it too low might make your code slow, too high might use too much memory).

The result is a 32 bit integer. The CRC-32 checksum of an empty file is 0.

Share:
46,125
user203547
Author by

user203547

Updated on August 17, 2021

Comments

  • user203547
    user203547 almost 3 years

    I want to calculate the CRC of file and get output like: E45A12AC. Here's my code:

    #!/usr/bin/env python 
    import os, sys
    import zlib
    
    def crc(fileName):
        fd = open(fileName,"rb")
        content = fd.readlines()
        fd.close()
        for eachLine in content:
            zlib.crc32(eachLine)
    
    for eachFile in sys.argv[1:]:
        crc(eachFile)
    

    This calculates the CRC for each line, but its output (e.g. -1767935985) is not what I want.

    Hashlib works the way I want, but it computes the md5:

    import hashlib
    m = hashlib.md5()
    for line in open('data.txt', 'rb'):
        m.update(line)
    print m.hexdigest()
    

    Is it possible to get something similar using zlib.crc32?