Is it safe to assume a GUID will always be unique?

53,358

Solution 1

Yes, you can. Since GUIDs are 128 bits long, there is admittedly a minute possibility of a clash—but the word "minute" is nowhere near strong enough. There are so many GUIDs that if you generate several trillion of them randomly, you're still more likely to get hit by a meteorite than to have even one collision (from Wikipedia). And if you aren't generating them randomly, but are e.g. using the MAC-address-and-time-stamp algorithm, then they're also going to be unique, as MAC addresses are unique among computers and time stamps are unique on your computer.

Edit 1: To answer your bonus question, the optimal way to test a set of GUIDs for uniqueness is to just assume that they are all are unique. Why? Because, given the number of GUIDs you're generating, the odds of a GUID collision are smaller than the odds of a cosmic ray flipping a bit in your computer's memory and screwing up the answer given by any "accurate" algorithm you'd care to run. (See this StackOverflow answer for the math.)

There are an enormous number of GUIDs out there. To quote Douglas Adams's Hitchhiker's Guide to the Galaxy:

"Space," it says, "is big. Really big. You just won't believe how vastly hugely mindbogglingly big it is. I mean you may think it's a long way down the road to the chemist, but that's just peanuts to space, listen…"

And since there are about 7×1022 stars in the universe, and just under 2128 GUIDs, then there are approximately 4.86×1015—almost five quadrillion—GUIDs for every single star. If every one of those stars had a world with a thriving population like ours, then around each and every star, every human or alien who had ever lived would be entitled to over forty-five thousand GUIDs. For every person in history at every star in the universe. The GUID space is at the same level of hugeness as the size of the entire universe. You do not need to worry.

(Edit 2: Reflecting on this: wow. I hadn't realized myself what this meant. The GUID space is incomprehensibly massive. I'm sort of in awe of it.)

Solution 2

Short answer: for practical purposes, yes.

However, you have to consider the birthday paradox!

I have calculated a few representative collision probabilities. With 122-bit UUIDs as specified in the Wikipedia article, the probability of collision is 1/2 if you generate at least 2.71492e18 UUIDs. With 10^19 UUIDs, the probability is 0.999918. With 10^17 UUIDs, 0.000939953.

Some numbers for comparison can be found on Wikipedia. So you can safely assign a UUID for each human that has lived, each galaxy in the observable universe, each fish in the ocean, and each individual ant on Earth. However, collisions are almost certain if you generate a UUID for each transistor humanity produces in a year, each insect on Earth, each grain of sand on Earth, each star in the observable universe, or anything larger.

If you generate 1 billion UUIDs per second, it would take about 36 years to get a collision probability of 10%.

Eventually, there will probably be a collision among the set of UUIDs generated over the course of human history. Still, the probability that collided UUIDs will be used for the same purpose is vanishingly small, so there's no problem in practice.

Solution 3

An analysis of the possibility of collision is available on Wikipedia: http://en.wikipedia.org/wiki/Uuid#Random_UUID_probability_of_duplicates

As mentioned in the link, this will be affected by the properties of the random number generator.

There is also the possibility of a bug in GUID generator code; while the chances are low, they are probably higher than the chances of a collision based on the mathematics.

A Bloom filter might be appropriate; it can quickly tell you if a GUID is unique, but there's a chance for a false indication of a collision. An alternate method if you're testing a batch at a time is to sort the batch and compare each successive element.

Solution 4

In general, yes it is safe to assume.

If your GUID generator is truly random, the possibilities of a clash within a 1000 GUIDs is extraordinarily small.

Of course, that assumes a good GUID generator. So the question is really about how much you trust the tool you're using to generate GUID and does it have its own tests?

Solution 5

This topic reminds me of the Deck of cards scenario. That is to say that there are so many ways a deck of 52 cards can be arranged, that its pretty much certain that no 2 properly shuffled decks of cards that have ever existed, have been in the same order.

If you take a deck now and shuffle it, that sequence will be unique, and will probably never be seen again in all of humanity. Indeed the potential number of ways to arrange 52 of anything is so unimaginably vast that the chances of any 2 decks happening to be the same order are close to zero.

In this example of having 40 shuffled decks and wanting to know for sure they are all unique, it's not impossible 2 of them are the same but its something that most likely would not occur if you were able to shuffle all the decks once every 10th of a second and you started at the birth of the universe.

Share:
53,358
Bo Sandelin
Author by

Bo Sandelin

Lambdas and stuff.

Updated on March 26, 2021

Comments

  • Bo Sandelin
    Bo Sandelin over 3 years

    I know there is a minute possibility of a clash but if I generated a batch of 1000 GUIDs (for example), would it be safe to assume they're all unique to save testing each one?

    Bonus question

    An optimal way to test a GUID for uniqueness? Bloom filter maybe?

    • ChrisF
      ChrisF about 14 years
      possible duplicate of Is a GUID unique 100% of the time?
    • mipadi
      mipadi about 14 years
      Not if we all keep mashing the reload button on this site: wasteaguid.info
    • Michael
      Michael about 14 years
      I blame all of my bugs on GUID collisions. It has to happen some time right?
    • David Gladfelter
      David Gladfelter about 14 years
      It's much more likely that a shark with a lovely plaid-patterned coloring will fall from the sky and smash your computer to bits, so I would submit that taking precautions against that is a more appropriate allocation of resources as part of your overall risk-reduction plan.
    • FrustratedWithFormsDesigner
      FrustratedWithFormsDesigner about 14 years
      @mipadi: great link! I can just picture some developer somewhere whining "Guuuuys! Stop wasting the GUIDs! I need those!"
    • overslacked
      overslacked about 14 years
      @FWFD - I'll admit that I twitched.
    • Gary
      Gary about 14 years
      It is more likely that the code you wrote in 1985 would be rewritten before your coding Year as two digits would cause an issue in 2000 than a GUID colliding.
    • Andy
      Andy over 6 years
      waste a guid should have a "guids left" counter @mipadi
  • new123456
    new123456 almost 13 years
    Also, WolframAlpha reports that, for every cell in every person who has ever lived, there are 36 trillion UUIDs. You have about 10^14 cells in your body, and 106.5 billion people have ever lived. Or, 2.385 * 10^23 UUIDs for every cent in the US public debt.
  • NullUserException
    NullUserException over 11 years
    Although the numbers are still high, the chances of a GUID collision are over 50% at 2^64 GUIDs.
  • NullUserException
    NullUserException over 11 years
    At 2^64 GUIDs, this would cut down the numbers to less than one (0.00026) per star in the Universe and 2*10^(-15) for every human or alien who has ever lived. This would still allow for over 170 million GUIDs for every human who has ever lived, so I think we're still good.
  • James Thorpe
    James Thorpe about 10 years
    Worth noting that a GUID collision is also only a problem if it's in the same business-space. A GUID I use to identify a component in a piece of software could be the same as a GUID you use in a database row in your own application without causing any problems
  • krookedking
    krookedking over 9 years
    Still tiny compared to Graham's number
  • BlackTigerX
    BlackTigerX almost 7 years
    The fact that there are 2^128 GUIDS is irrelevant, and you are not "still good" at 50% chance of collision, you are not even good at 0.0000001%
  • pkr
    pkr over 6 years
    This is how the universe ends... Some programmer just assumes their GUIDs will always be unique for their mega Death Star...
  • mjaggard
    mjaggard about 6 years
    Because UUIDs are based on non-random data, 36 years is - you only have to worry about each millisecond individually.
  • Max Barraclough
    Max Barraclough about 5 years
    "MAC addresses are unique among computers" Some (extremely sloppy) vendors have been known to break the rule, at least according to superuser.com/a/968346
  • Hakanai
    Hakanai about 5 years
    @mjaggard UUIDs are based on random data. Any modern sort, anyway.