When is CRC more appropriate to use than MD5/SHA1?

102,236

Solution 1

CRC works fine for detecting random errors in data that might occur, for example, from network interference, line noise, distortion, etc.

CRC is computationally much less complex than MD5 or SHA1. Using a hash function like MD5 is probably overkill for random error detection. However, using CRC for any kind of security check would be much less secure than a more complex hashing function such as MD5.

And yes, CRC is much easier to implement on embedded hardware, you can even get different packaged solutions for this on IC.

Update Yes, this answer is old. Please don't use SHA1 or MD5 for security purposes ;)

Solution 2

CRC is designed against unintentional changes in the data. That is, it's good for detecting unintentional errors, but will be useless as a way of making sure a data was not maliciously handled.

Also see this.

Solution 3

I found a study that shows how inappropriate CRC hashes are for hash tables. It also explains the actual characteristics of the algorithm. The study also includes evaluation of other hash algorithms and is a good reference to keep.

UPDATE

It seems the site is down. The internet archive has a copy though.

UPDATE 2

Oh dear. It turns out the study may have been faulty around the conclusions on CRC for use as a hash. Thanks @minexew for the link.

Solution 4

I ran every line of this PHP code in 1.000.000 loop. Results are in comments (#).

hash('crc32', 'The quick brown fox jumped over the lazy dog.');#  750ms   8 chars
hash('crc32b','The quick brown fox jumped over the lazy dog.');#  700ms   8 chars
hash('md5',   'The quick brown fox jumped over the lazy dog.');#  770ms  32 chars
hash('sha1',  'The quick brown fox jumped over the lazy dog.');#  880ms  40 chars
hash('sha256','The quick brown fox jumped over the lazy dog.');# 1490ms  64 chars
hash('sha384','The quick brown fox jumped over the lazy dog.');# 1830ms  96 chars
hash('sha512','The quick brown fox jumped over the lazy dog.');# 1870ms 128 chars

My conclusion:

  • Use "crc32b" when you need http://en.wikipedia.org/wiki/Cyclic_redundancy_check and you do not care about security.
  • Use "sha256" (or higher) when you need added security layer.

  • Do not use "md5" or "sha1" because they have:

    1. some security issues when you care about security
    2. longer hash string and are slower than "crc32b" when all you need is CRC

Solution 5

For CRC information on implementation, speed and reliability see A painless guide to CRC error detection algorithms. It has everything on CRCs.

Unless somebody is going to try and modify your data maliciously and hide the change CRC is sufficient. Just use a "Good" (standard) polinomial.

Share:
102,236
Gili
Author by

Gili

Email: cowwoc2020 at gmail dot com.

Updated on July 08, 2022

Comments

  • Gili
    Gili almost 2 years

    When is it appropriate to use CRC for error detection versus more modern hashing functions such as MD5 or SHA1? Is the former easier to implement on embedded hardware?

  • Gili
    Gili about 15 years
    I there a way of getting a 32-bit MD5 or SHA1 hash? As you said, 128 bits for error detection sounds a bit overkill. Why don't low-level communication protocols use MD5 or SHA1? Is it because the hash size is too big relative to the packet size?
  • Gerhard
    Gerhard about 15 years
    CRC will catch all accidental data changes if you are using a proper polynomial. 1/2^32 changes are missed if exactly the right multiple bits are changed.
  • erikkallen
    erikkallen about 15 years
    And with a proper polynomial it will also catch all errors of certain common classes, e.g. burst errors.
  • Coxy
    Coxy over 14 years
    @Dustin: You are completely correct in your answer, but perhaps consider changing "CRC is computationally much more efficient" to "CRC is computationally much easier"? MD5/SHA-1 algorithms are complex, but not really 'inefficient' IMO.
  • defines
    defines over 14 years
    @coxymla you are correct, the word I should have used is "complex" not "inefficient". Thanks!
  • orip
    orip about 14 years
    To reduce any long hash to 32 bits, just take the first 32 bits.
  • supercat
    supercat over 13 years
    One problem with that approach is when run on a file which contains an enbedded CRC32 within it, the resulting CRC may be independent of the data in the file (since if the data changes, the CRC32 will be changed so as to cancel out the difference). Munging the data in some simple way before computing the CRC32 would avoid that problem.
  • teratorn
    teratorn about 13 years
    @supercat - I really don't believe that this is actually an issue. If a file contains a crc32 header which is the crc32 of the rest of the file, then when the file is updated each bit in the crc32 header will have approximately a 50% chance of being different. The changes in the header should follow a fairly random distribution. I fail to see how this is going to result in the CRC32(header + data) always being the same, or in any way not dependent on the data portion of the file.
  • supercat
    supercat about 13 years
    @teratorn: I've seen a number of files which have a CRC32 at the end, computed in such a way that the CRC32 of the entire file, computed using some particular seed constant, will always be some other constant value. This is pretty common with things like binary code images. If the Acme 1000 DVD player uses fixed-size code images for firmware upgrades, and expects every code image to have a certain CRC32, then a routine which computes the CRC32's of various files would be unable to distinguish different code images for the Acme 1000.
  • Martin
    Martin almost 12 years
    Not really. echo hash('crc32', 'The quick brown fox jumped over the lazy dog.'); echoes "413a86af", what is 8 character long string. Btw, it is 32bit number stored in HEX format. For example, "sha256" has 256bits hash, again stored as HEX, what gives 64 character long string.
  • ubiquibacon
    ubiquibacon almost 12 years
    These results are very deceiving. When these hashing algorithms are applied to a large data set (War and Peace instead of "The quick brown fox jumped over the lazy dog.") you would see how much faster CRC is than MD5.
  • ceving
    ceving over 10 years
    Link is broken. Maybe you can write the explanation yourself? If not the answer is useless.
  • Andre Luus
    Andre Luus over 10 years
    Okay, I'll include the conclusion in my answer.
  • Dewi Morgan
    Dewi Morgan about 10 years
    There is an intermediate case (duplicate checking in libraries) where MD5/Sha1 are the correct solution: they don't need to handle the case where there's an adversary carefully crafting the vanishingly unlikely hash collision, but they do need to handle accidental collisions. So: Detecting bit errors and corruption: CRC32 Detecting collisions in libraries: MD5/SHA1 Adversarial applications: Sha256 and above. Of course, if you have a library with billions of entries, then you will probably need to increase your hash bits, too.
  • ostrokach
    ostrokach almost 9 years
    Weird, according to the benchmark here, CRC actually does pretty well in terms of speed and number of collisions.
  • Andre Luus
    Andre Luus almost 9 years
    Very interesting indeed. I had to look over the study I linked to again, but if I had to guess it must be because of the different testing implementations. If I had to make a decision, I'd go for the advice from the study, it appears to be more scientifically sound.
  • Marc.2377
    Marc.2377 about 8 years
    Most important part from the link in this answer: "(...) even a 2048-bit CRC would be cryptographically much less secure than a 128-bit MD5"
  • Piskvor left the building
    Piskvor left the building over 7 years
    While the answer is still correct, MD5 and SHA1 are at the same level of security nowadays. In other words, only good for detecting unintentional errors.
  • Raven
    Raven over 7 years
    If security is you goal then you should never ever use MD5, SHA-1 should also be avoided, some variant of SHA-2 is recommended.
  • J. Dimeo
    J. Dimeo about 7 years
    In my experience hashing millions of URLs, CRC64 collided 8 times and MD5 collided 5. Obviously MD5 was better, but CRC64 was a great and much faster and simpler hash.
  • bryc
    bryc about 6 years
    CRC as a hash is merely a side effect of its decent error-detection properties. Unless you're expecting burst or single bit errors, any non-crypto hash can be much faster than CRC.
  • ilgitano
    ilgitano almost 6 years
    CRC can show artifacts for very short message sizes, which would be the case when using it for hash tables. That said, I use them for hash tables, but not blindly.
  • ilgitano
    ilgitano almost 6 years
    PHP? on an ARM platform, embedded code, 16MHz a CRC32 of 46 bytes, maybe 12 microseconds. That has hardware assist. Even hardware assisted AES would be several hundred times slower. Unassisted lookup table CRC should still come in around 50 microseconds.
  • ilgitano
    ilgitano almost 6 years
    Would absolutely disagree with that. CRC error polynomials are carefully chosen so that they can provably detect 1,2,3,5 and burst errors up to something like 11 bits in some cases. A cryptographic hash is purely statisitical, so you have to use large digest values. 8-32 bits is unrealistic for a cryptographic hash digest as well as pointlessly expensive in cpu cyles and gates. Definitely not an answer to take on board if you work on embedded systems. The only time NOT to use a CRC is if you have to deal with an intelligent adversary scenario.
  • ilgitano
    ilgitano almost 6 years
    Point of the CRC in that case is to to quickly identify that the files are different. If the CRC comes back the same, you now have to do an expensive binary compare, so an embedded CRC doesn't break the algorithm. Could happen that some files end up being binary compared because the CRC first pass says they MIGHT be the same, but unlikely to be many of those, and you can avoid it by using a custom polynomial.
  • avgJoe
    avgJoe over 3 years
    @defines: The computational complexity of MD5 is O(n) according to this thread (stackoverflow.com/questions/43625569/time-complexity-of-md5‌​). What is the computational complexity of CRC? Given that it must look at all the data, how could it be better than O(n)? I am just asking since you write that MD5 was more complex than CRC,and I wasn't sure if you meant computational complexity.
  • defines
    defines over 3 years
    Big-O notation is only part of the picture. There is no such thing as O(2N) because the constant multiples have no meaning in big-O notation. So even though it is a linear-complexity algorithm, there are more (constant-numbered) steps per iteration. Perhaps complexity was also not the right word :) en.wikipedia.org/wiki/…
  • minexew
    minexew about 3 years
    Bret's study has been debunked, see: github.com/Michaelangel007/crc32/blob/master/…
  • Andre Luus
    Andre Luus almost 3 years
    Well - how about that @minexew. I'll update my answer.
  • benrg
    benrg about 2 years
    CRCs are not just "fine" for detecting random errors. They are better at detecting random errors than random hashes of the same length. At short lengths, they are so much better than you'd be crazy to use a random hash. I just wrote this up in another answer.
  • benrg
    benrg about 2 years
    CRCs are not just "sufficient"; they're far superior to random hashes at short lengths for certain types of errors. I just wrote another answer about this.
  • benrg
    benrg about 2 years
    The benchmarks are useless because the strings are so short; the run time is dominated by unrelated overhead. You failed to explain the crucial fact that CRCs are better at detecting uncommon, random errors than random hashes are: see this answer.
  • benrg
    benrg about 2 years
    "This is true" - no, the exact opposite is true. See this answer.
  • defines
    defines about 2 years
    I'm not sure what you mean by random hashes, but CRC is of fixed length and is essentially a hashing function. If the error exists in the check value, the CRC will be invalid anyway. A higher-length function will definitely be "better" (more effective at error detection) but also probably overkill, requiring more instructions.