Probability of collision when using a 32 bit hash

45,903

Solution 1

Duplicate of Expected collisions for perfect 32bit crc

The answer referenced this article: http://arstechnica.com/civis/viewtopic.php?f=20&t=149670

Found the image below from: http://preshing.com/20110504/hash-collision-probabilities

enter image description here

Solution 2

In the case you cite, at least one collision is essentially guaranteed. The probability of at least one collision is about 1 - 3x10-51. The average number of collisions you would expect is about 116.

In general, the average number of collisions in k samples, each a random choice among n possible values is:

N(n,k)~=k(k-1)/(2n)

The probability of at least one collision is:

p(n,k)~=1-e^(-k(k-1)/(2n))

In your case, n = 232 and k = 106.

The probability of a three-way collision in your case is about 0.01. See the Birthday Problem.

Share:
45,903
nguyenngoc101
Author by

nguyenngoc101

Updated on July 18, 2022

Comments

  • nguyenngoc101
    nguyenngoc101 almost 2 years

    I have a 10 character string key field in a database. I've used CRC32 to hash this field but I'm worry about duplicates. Could somebody show me the probability of collision in this situation?

    p.s. my string field is unique in the database. If the number of string fields is 1 million, what is probability of collision ?

  • nguyenngoc101
    nguyenngoc101 over 11 years
    Thank you so much, I wonder that this probability still depend on CRC32 algorithm ?
  • Mark Adler
    Mark Adler over 11 years
    Any good 32-bit hash algorithm will give exactly the same result.