Unique key generation

15,931

Solution 1

There are only 3 ways to generate unique values, rather they be passwords, user IDs, etc.:

  1. Use an effective GUID generator - these are long and cannot be shrunk. If you only use part you FAIL.
  2. At least part of the number is sequentially generated off of a single sequence. You can add fluff or encoding to make it look less sequential. Advantage is they start short - disadvantage is they require a single source. The work around for the single source limitation is to have numbered sources, so you include the [source #] + [seq #] and then each source can generate its own sequence.
  3. Generate them via some other means and then check them against the single history of previously generated values.

Any other method is not guaranteed. Keep in mind, fundamentally you are generating a binary number (it is a computer), but then you can encode it in Hexadecimal, Decimal, Base64, or a word list. Pick an encoding that fits your usage. Usually for user entered data you want some variation of Base32 (which you hinted at).

Note about GUIDS: They gain their strength of uniqueness from their length and the method used to generate them. Anything less than 128-bits is not secure. Beyond random number generation there are characteristics that go into a GUID to make it more unique. Keep in mind they are only practically unique, not completely unique. It is possible, although practically impossible to have a duplicate.

Updated Note about GUIDS: Since writing this I learned that many GUID generators use a cryptographically secure random number generator (difficult or impossible to predict the next number generated, and a not likely to repeat). There are actually 5 different UUID algorithms. Algorithm 4 is what Microsoft currently uses for the Windows GUID generation API. A GUID is Microsoft's implementation of the UUID standard.

Update: If you want 7 to 16 characters then you need to use either method 2 or 3.

Bottom line: Frankly there is no such thing as completely unique. Even if you went with a sequential generator you would eventually run out of storage using all the atoms in the universe, thus looping back on yourself and repeating. Your only hope would be the heat death of the universe before reaching that point.

Even the best random number generator has a possibility of repeating equal to the total size of the random number you are generating. Take a quarter for example. It is a completely random bit generator, and its odds of repeating are 1 in 2.

So it all comes down to your threshold of uniqueness. You can have 100% uniqueness in 8 digits for 1,099,511,627,776 numbers by using a sequence and then base32 encoding it. Any other method that does not involve checking against a list of past numbers only has odds equal to n/1,099,511,627,776 (where n=number of previous numbers generated) of not being unique.

Solution 2

Any algorithm will result in duplicates.

Therefore, might I suggest that you use your existing algorithm* and simply check for duplicates?

*Slight addition: If uniqid() can be non-unique based on time, also include a global counter that you increment after every invocation. That way something is different even in the same microsecond.

Share:
15,931
penetra
Author by

penetra

I am a website and web application developer in Calgary, Alberta. I have been doing backend web development in PHP and frontend in HTML/CSS/JavaScript for over 20 years. My specialties are Symfony, Vue, Event Sourcing & CQRS, Craft CMS, WordPress. I've built everything from basic basic brochure style websites to heavily trafficked eCommerce site and social platforms to internal applications.

Updated on June 06, 2022

Comments

  • penetra
    penetra almost 2 years

    I looking for a way, specifically in PHP that I will be guaranteed to always get a unique key.

    I have done the following:

    strtolower(substr(crypt(time()), 0, 7));
    

    But I have found that once in a while I end up with a duplicate key (rarely, but often enough).

    I have also thought of doing:

    strtolower(substr(crypt(uniqid(rand(), true)), 0, 7));
    

    But according to the PHP website, uniqid() could, if uniqid() is called twice in the same microsecond, it could generate the same key. I'm thinking that the addition of rand() that it rarely would, but still possible.

    After the lines mentioned above I am also remove characters such as L and O so it's less confusing for the user. This maybe part of the cause for the duplicates, but still necessary.

    One option I have a thought of is creating a website that will generate the key, storing it in a database, ensuring it's completely unique.

    Any other thoughts? Are there any websites out there that already do this that have some kind of API or just return the key. I found http://userident.com but I'm not sure if the keys will be completely unique.

    This needs to run in the background without any user input.

  • penetra
    penetra over 15 years
    because it may not always be a password, but instead their actual username/login id or say a transaction id. Both of these are basically a username so they have to be unique. I have the database setup to only accept unique passwords, but again, I can't assume I'm dealing with a database that I have access to. The other thing is I only want 7 to maybe 15 characters. GUIDs are typically much longer and therefore will end up not being unique quite easily.
  • Joel
    Joel over 15 years
    This is close, but depending on your random routine only has just over microsecond accuracy. As he mentioned, uniqid failed for the same possibility.
  • Joel
    Joel over 15 years
    If you take a substring of a cryptographically secure hash then you defeat the uniqueness of the hash. No portion of a hash is more secure or random than any other. It is just as likely to get equal passwords as using a regular call to a random number generator.
  • Joel
    Joel over 15 years
    GUID (or UUIDS) and sequences are the only two ways to avoid duplicates, but you are correct for every other algorithm mentioned here.
  • Jason Cohen
    Jason Cohen over 15 years
    If arbitrary identifier lengths are allowed, I agree. If he wants to keep to 8 chars as in his examples, it's just not enough.
  • Joel
    Joel over 15 years
    8 characters is enough for exactly 1,099,511,627,776 unique passwords if you use Base32, which is respectable for manually entered data (32^8) This makes no allowance for verification, nor does it exclude patterns like 00000000.
  • Joel
    Joel over 15 years
    Collisions are due to his algorithm or his encoding (maybe only using Decimal, which is only 100,000,000 possibilities. Centralized sequential generation is the only solution that will work, or checking for duplicates like you suggested.
  • Marco Demaio
    Marco Demaio about 14 years
    I don't undertstand why you need to md5 values. MD5 is a one way hashing that creates a signature. Be careful that different input values might result in creating the same signature, MD5 functions are triggered to attempt to return different results when input values are slightly different not completly different. So by adding MD5 I think you increase the probability to get duplicated values.
  • Jon
    Jon over 11 years
    Hello, do you have any further reading about number 2, I'd like to use this kind of code generation using different "sources", Thanks.
  • Joel
    Joel over 11 years
    @Jon: I don't have any reading on this, but essentially you concatenate the a source identifier with a sequence from that source. So you know it is unique. For example, if you had two sources, then you would have a sequence of A1, A2, A3, A4 .. . A999 and another sequence of B1, B2, B3, B4 . .. B999. Since a ID from source A will never start with B, then you know there will never be a collision.
  • Jon
    Jon over 11 years
    k thank you, could the source identifier be an id for a db table so that when a code is entered information could be gather from a record in a table?