Is it wrong to use a hash for a unique ID?

12,935

Solution 1

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.

To answer your final question:

A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.

As @wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

Solution 2

How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.

Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions

Solution 3

Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

Solution 4

Computer random isn't actually random, you know? The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.

Share:
12,935
texelate
Author by

texelate

Updated on July 16, 2022

Comments

  • texelate
    texelate almost 2 years

    I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:

    sha1(uniqid(mt_rand(), true))
    

    Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?

    A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

  • texelate
    texelate about 11 years
    Thanks, would that be: sha1(base64_encode(microtime(). uniqid(mt_rand(), true)))
  • Morten Jensen
    Morten Jensen about 11 years
    @texelate that would be a fine starting point. See this page point especially 1+2 for more info on salting: blog.ircmaxell.com/2012/12/seven-ways-to-screw-up-bcrypt.htm‌​l
  • texelate
    texelate about 11 years
    Thanks. I need this for uniqueness rather than security but a very useful read all the same.
  • Morten Jensen
    Morten Jensen about 11 years
    then you remove the random possibility of a repeating hash .. unless you instantiate more than 1 object pr. second - which is not a lot IMHO. I'd recommend against just using the timestamp as input for a unique key