MD5 vs CRC32: Which one's better for common use?

65,940

Solution 1

MD5 is a one-way-hash algorithm. One-way-hash algorithms are often used in cryptography as they have the property (per design) that it's hard to find the input that produced a specific hash value. Specifically it's hard to make two different inputs that give the same one-way-hash. They are often used as a way to show that an amount of data has not been altered intentionally since the hash code was produced. As the MD5 is a one-way-hash algorithm the emphasis is on security over speed. Unfortunately MD5 is now considered insecure.

CRC32 is designed to detect accidental changes to data and is commonly used in networks and storage devices. The purpose of this algorithm is not to protect against intentional changes, but rather to catch accidents like network errors and disk write errors, etc. The emphasis of this algorithm is more on speed than on security.

Solution 2

From Wikipedia's article on MD5 (emphasis mine):

MD5 is a widely used cryptographic hash function

Now CRC32:

CRC is an error-detecting code

So, as you can see, CRC32 is not a hashing algorithm. That means you should not use it for hashing, because it was not built for that.

And I think it doesn't make much sense to talk about common use, because similar algorithms are used for different purposes, each with significantly different requirements. There is no single algorithm that's best for common use, instead, you should choose the algorithm that's most suited for your specific use.

Solution 3

It depends on your goals. Here are some examples what can be done with CRC32 versus MD5:

Detecting duplicate files

If you want to check if two files are the same, CRC32 checksum is the way to go because it's faster than MD5. But be careful: CRC only reliably tells you if the binaries are different; it doesn't tell you if they're identical. If you get different hashes for two files, they cannot be the same file, so you can reject them as being duplicates very quickly.

No matter what your keys are, the CRC32 checksum will be one of 2^32 different values. Assuming random sample files, the probability of collision between the hashes of two given files is 1 / 2^32. The probability of collisions between any of N given files is (N - 1) / 2^32.

Detecting malicious software

If security is an issue, like downloading a file and checking the source's hash against yours to see if the binaries aren't corrupted, then CRC is a poor option. This is because attackers can make malware that will have the same CRC checksum. In this case, an MD5 digest is more secure -- CRC was not made for security. Two different binaries are far more likely to have the same CRC checksum than the same MD5 digest.

Securing passwords for user authentication

Synchronous (one-way) encryption is usually easier, faster, and more secure than asynchronous (two-way) encryption, so it's a common method to store passwords. Basically, the password will be combined with other data (salts) then the hash will be done on all of this combined data. Random salts greatly reduce the chances of two passwords being the same. By default, the same password will have the same hash for most algorithms, so you must add your own randomness. Of course, the salt must be saved externally.

To log a user in, you just take the information they give you when they log in. You use their username to get their salt from a database. You then combine this salt with the user's password to get a new hash. If it matches the one in in the database, then their login is successful. Since you're storing these passwords, they must be VERY secure, which means a CRC checksum is out of the question.

Cryptographic digests are more expensive to compute than CRC checksums. Also, better hashes like sha256 are more secure, but slower for hashing and take up more database space (their hashes are longer).

Solution 4

One big difference between CRC32 and MD5 is that it is usually easy to pick a CRC32 checksum and then come up with a message that hashes to that checksum, even if there are constraints imposed on the message, whereas MD5 is specifically designed to make this sort of thing difficult (although it is showing its age - this is now possible in some situations).

If you are in a situation where it is possible that an adversary might decide to sit down and create a load of messages with specified CRC32 hashes, to mimic other messages, or just to make a hash table perform very badly because everything hashes to the same value, then MD5 would be a better option. (Even better, IMHO, would be HMAC-MD5 with a keyed value that is unique to the module using it and unknown outside it).

Solution 5

CRCs are used to guard against random errors, for example in data transmission.

Cryptographic hash functions are designed to guard against intelligent adversaries forging the message, though MD5 has been broken in that respect.

Share:
65,940
bytefire
Author by

bytefire

"Layers of abstraction that serve to hide internals are never a good thing. Instead, the internals should be designed in a way such that they NEED no hiding." -- Aaron Griffin, when describing The Arch Way GitHub: https://github.com/bytefire Linux Kernel Contributions: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=author&q=Okash+Khawaja LinkedIn: https://www.linkedin.com/in/okash-khawaja-a7174030 Blog: http://binarydebt.wordpress.com Twitter: https://twitter.com/binarydebt

Updated on July 15, 2022

Comments

  • bytefire
    bytefire almost 2 years

    Recently I read somewhere that although both CRC32 and MD5 are sufficiently uniform and stable, CRC32 is more efficient than MD5. MD5 seems to be a very commonly used hashing algorithm but if CRC32 is faster/more memory efficient then why not use that?

  • svick
    svick about 11 years
    “If you are […] working out hash codes for small items, then CRCs are better” No, they are not. CRC is not a hashing algorithm.
  • cup
    cup about 11 years
    Any algorithm that maps a large dataset to a smaller dataset can be used as a hashing algorithm. It may not be a particularly good one in that you may get a lot of collisions but it can still be used.
  • svick
    svick about 11 years
    But you're not saying that you can use CRC as a hashing algorithm, you're saying that you should use it (under some circumstances).
  • Montre
    Montre about 11 years
    "if you are going to do one huge transmission and check at the end whether you have the correct result" - the thing is, this isn't hashing. While it's true that hash functions can be used to detect transmission errors (since they have very - even needlessly - strong guarantees on diffusion), it's wrong to conflate the concepts. I also strongly doubt there's a non-negligible probability of CRC giving you the same results after you lop off a random chunk off an ISO.
  • MikeW
    MikeW almost 8 years
    Well, CRC32 does not address "security" in the cryptographic sense, at all. The emphasis is on error-detection capability, with resistance to factors such as leading/trailing zeros/ones, burst errors, repeated bits, changes in message length, etc. - in conjunction with comparative simplicity to implement in terms of hardware logic (which will usually operate at the data bit-rate), and hence, speed of operation in a software implementation.
  • MikeW
    MikeW almost 8 years
    I would take issue with your use of the phrase "a message that hashes to that checksum" - I know what you mean, but would suggest something like "shares the same CRC value" - since it's not a "hash" function.
  • MikeW
    MikeW almost 8 years
    A CRC will detect all: * 1 or 2 bit errors * odd number of errors * error burst same width as the CRC value
  • mcdowella
    mcdowella almost 8 years
    CRC32 is not a cryptographic hash, but it can be used as a hash function. The original post specifically mentioned hash functions. CRC fits within the definition of a hash function as at en.wikipedia.org/wiki/Hash_function, and is specifically mentioned there (although with some very dubious suggestions about using only the top 16 bits).
  • Arnaud Bouchez
    Arnaud Bouchez over 7 years
    You are measuring here the relative speed of PHP implementations, not the speed of algorithms. Seems like PHP poorly implemented crc32. CRC32 IS much faster than MD5, when a cryptographic library is properly implement. Thanks to crc32c hadware accelleration available on latest Intel CPUs, you can compute a crc32c at the speed of the memory bandwidth (more than 4GB/s on my PC), whereas MD5 does not pipeline and even the most optimized versions are much slower.
  • Wowbagger and his liquid lunch
    Wowbagger and his liquid lunch over 7 years
    CRC32 is a hash function but it is not a cyrptographic hash function.
  • svick
    svick over 7 years
    @mehaase It does technically satisfy the definition of a hash function. But it was not build to be one, and so it's not a good idea to use it as a hash function, cryptographic or otherwise.
  • Bonita Montero
    Bonita Montero over 4 years
    CRC has a good uniform distribution to be a hash-function. So you could use it for that purpose. But you usually don't need that quality of hashing for a hashtable.
  • L00_Cyph3r
    L00_Cyph3r almost 3 years
    Update from 2021, seems that PHP8.0 has improved the speed, where crc32 was ~2x slower than md5 in PHP<=7.4, in 8.0 it is ~10x faster
  • Apurva Singh
    Apurva Singh about 2 years
    but CRC is used for hashing in Redis partitioning