Probability of SHA1 collisions

50,101

Solution 1

alt text

Are the 160 bit hash values generated by SHA-1 large enough to ensure the fingerprint of every block is unique? Assuming random hash values with a uniform distribution, a collection of n different data blocks and a hash function that generates b bits, the probability p that there will be one or more collisions is bounded by the number of pairs of blocks multiplied by the probability that a given pair will collide.

(source : http://bitcache.org/faq/hash-collision-probabilities)

Solution 2

Well, the probability of a collision would be:

1 - ((2^160 - 1) / 2^160) * ((2^160 - 2) / 2^160) * ... * ((2^160 - 99) / 2^160)

Think of the probability of a collision of 2 items in a space of 10. The first item is unique with probability 100%. The second is unique with probability 9/10. So the probability of both being unique is 100% * 90%, and the probability of a collision is:

1 - (100% * 90%), or 1 - ((10 - 0) / 10) * ((10 - 1) / 10), or 1 - ((10 - 1) / 10)

It's pretty unlikely. You'd have to have many more strings for it to be a remote possibility.

Take a look at the table on this page on Wikipedia; just interpolate between the rows for 128 bits and 256 bits.

Solution 3

That's Birthday Problem - the article provides nice approximations that make it quite easy to estimate the probability. Actual probability will be very very very low - see this question for an example.

Share:
50,101
eastafri
Author by

eastafri

Updated on July 05, 2022

Comments

  • eastafri
    eastafri almost 2 years

    Given a set of 100 different strings of equal length, how can you quantify the probability that a SHA1 digest collision for the strings is unlikely... ?

  • Bahadır Yıldırım
    Bahadır Yıldırım over 14 years
    In conclusion, the likelihood of a collision is in the order of 10^-45. That is very, very unlikely.
  • Kamel
    Kamel about 9 years
    But SHA-1 is not uniform distribution, it could be bigger than this upper bound. I doubt that this equation is not correct. AS least the EQUAL.
  • Djarid
    Djarid about 8 years
    This answer doesn't take into account the chinese discovery in 2005 where they are able to produce collisions in 2^69 iterations rather than the 2^80 projected by brute force schneier.com/blog/archives/2005/02/sha1_broken.html
  • Pierre D
    Pierre D over 7 years
    @Djarid It's important not to confound accidental hash collision and adversarial collision hunting. The former is the probability that the hash of two items will collide, and follows the formula above (although, as noted by Kamel, the distribution is not perfectly uniform and thus the probability is likely higher). The latter is used to intentionally try to find collisions, and relies on discovering and exploiting weaknesses in the hash. Cryptographic hashes attempt to be robust against such attacks, but often they are overkill for simpler hashing applications (not transmitting secrets).
  • MrIo
    MrIo over 3 years
    that formula is accurate when 2^b >> n^2 (and when 2^b very big). I know it is the mayority (if not all) the cases for sha1... but for the record!