compute crc of file in python
Solution 1
A little more compact and optimized code
def crc(fileName):
prev = 0
for eachLine in open(fileName,"rb"):
prev = zlib.crc32(eachLine, prev)
return "%X"%(prev & 0xFFFFFFFF)
PS2: Old PS is deprecated - therefore deleted -, because of the suggestion in the comment. Thank you. I don't get, how I missed this, but it was really good.
Solution 2
A modified version of kobor42's answer, with performance improved by a factor 2-3 by reading fixed size chunks instead of "lines":
import zlib
def crc32(fileName):
with open(fileName, 'rb') as fh:
hash = 0
while True:
s = fh.read(65536)
if not s:
break
hash = zlib.crc32(s, hash)
return "%08X" % (hash & 0xFFFFFFFF)
Also includes leading zeroes in the returned string.
Solution 3
hashlib-compatible interface for CRC-32 support:
import zlib class crc32(object): name = 'crc32' digest_size = 4 block_size = 1 def __init__(self, arg=''): self.__digest = 0 self.update(arg) def copy(self): copy = super(self.__class__, self).__new__(self.__class__) copy.__digest = self.__digest return copy def digest(self): return self.__digest def hexdigest(self): return '{:08x}'.format(self.__digest) def update(self, arg): self.__digest = zlib.crc32(arg, self.__digest) & 0xffffffff # Now you can define hashlib.crc32 = crc32 import hashlib hashlib.crc32 = crc32 # Python > 2.7: hashlib.algorithms += ('crc32',) # Python > 3.2: hashlib.algorithms_available.add('crc32')
Solution 4
To show any integer's lowest 32 bits as 8 hexadecimal digits, without sign, you can "mask" the value by bit-and'ing it with a mask made of 32 bits all at value 1, then apply formatting. I.e.:
>>> x = -1767935985
>>> format(x & 0xFFFFFFFF, '08x')
'969f700f'
It's quite irrelevant whether the integer you are thus formatting comes from zlib.crc32
or any other computation whatsoever.
Solution 5
Python 3.8+ (using the walrus operator):
import zlib
def crc32(filename, chunksize=65536):
"""Compute the CRC-32 checksum of the contents of the given filename"""
with open(filename, "rb") as f:
checksum = 0
while (chunk := f.read(chunksize)) :
checksum = zlib.crc32(chunk, checksum)
return checksum
chunksize
is how many bytes at a time you read the file. It doesn't matter what you set it to, you will get the same hash for the same file (setting it too low might make your code slow, too high might use too much memory).
The result is a 32 bit integer. The CRC-32 checksum of an empty file is 0
.
user203547
Updated on August 17, 2021Comments
-
user203547 almost 3 years
I want to calculate the CRC of file and get output like:
E45A12AC
. Here's my code:#!/usr/bin/env python import os, sys import zlib def crc(fileName): fd = open(fileName,"rb") content = fd.readlines() fd.close() for eachLine in content: zlib.crc32(eachLine) for eachFile in sys.argv[1:]: crc(eachFile)
This calculates the CRC for each line, but its output (e.g.
-1767935985
) is not what I want.Hashlib works the way I want, but it computes the md5:
import hashlib m = hashlib.md5() for line in open('data.txt', 'rb'): m.update(line) print m.hexdigest()
Is it possible to get something similar using
zlib.crc32
?