Does randomUUID give a unique id?

32,367

Solution 1

If you get a UUID collision, go play the lottery next.

From Wikipedia:

Randomly generated UUIDs have 122 random bits. Out of a total of 128 bits, four bits are used for the version ('Randomly generated UUID'), and two bits for the variant ('Leach-Salz').

With random UUIDs, the chance of two having the same value can be calculated using probability theory (Birthday paradox). Using the approximation

p(n)\approx 1-e^{-\tfrac{n^2}{{2x}}}

these are the probabilities of an accidental clash after calculating n UUIDs, with x=2122:

n probability 68,719,476,736 = 236 0.0000000000000004 (4 × 10−16) 2,199,023,255,552 = 241 0.0000000000004 (4 × 10−13) 70,368,744,177,664 = 246 0.0000000004 (4 × 10−10)

To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion, which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of > UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.

Solution 2

Since a UUID has a finite size there is no way for it to be unique across all of space and time.

If you need a UUID that is guaranteed to be unique within any reasonable use case you can use Log4j 2's Uuid.getTimeBasedUuid(). It is guaranteed to be unique for about 8,900 years so long as you generate less than 10,000 UUIDs per millisecond.

Solution 3

Oracle UUID document. http://docs.oracle.com/javase/7/docs/api/java/util/UUID.html

They use this algorithm from the The Internet Engineering Task Force. http://www.ietf.org/rfc/rfc4122.txt

A quote from the abstract.

A UUID is 128 bits long, and can guarantee uniqueness across space and time.

While the abstract claims a guarantee, there are only 3.4 x 10^38 combinations. CodeChimp

Solution 4

From UUID.randomUUID() Javadoc:

Static factory to retrieve a type 4 (pseudo randomly generated) UUID. The UUID is generated using a cryptographically strong pseudo random number generator.

It's random and therefore a collision will occur, definitely, as confirmed others in comments above/below that detected collisions very early. Instead of a version 4 (random based) I would advice You to use version 1 (time based).

Possible solutions:

1) UUID utility from Log4j

You can use 3rd party implementation from Log4j UuidUtil.getTimeBasedUuid() that is based on the current timestamp, measured in units of 100 nanoseconds from October 10, 1582, concatenated with the MAC address of the device where the UUID is created. Please see package org.apache.logging.log4j.core.util from artifact log4j-core.

2) UUID utility from FasterXML

There is also 3rd party implementation from FasterXML Generators.timeBasedGenerator().generate() that is based on time and MAC address, too. Please see package com.fasterxml.uuid from artifact java-uuid-generator.

3) Do it on your own

Or You can implement Your own using constructor new UUID(long mostSigBits, long leastSigBits) from core Java. Please see following very nice explanation Baeldung - Guide to UUID in Java where October 15, 1582 (actually, very famous day) is used in implementation.

Share:
32,367
birdy
Author by

birdy

Updated on August 24, 2022

Comments

  • birdy
    birdy over 1 year

    I am trying to create session tokens for my REST API. Each time the user logs in I am creating a new token by

    UUID token = UUID.randomUUID();
    user.setSessionId(token.toString());
    Sessions.INSTANCE.sessions.put(user.getName(), user.getSessionId());
    

    However, I am not sure how to protect against duplicate sessionTokens.

    For example: Can there be a scenario when user1 signs in and gets a token 87955dc9-d2ca-4f79-b7c8-b0223a32532a and user2 signs in and also gets a token 87955dc9-d2ca-4f79-b7c8-b0223a32532a.

    Is there a better way of doing this?

  • Vishy
    Vishy over 10 years
    guarantee is a strong word.
  • SethB
    SethB over 10 years
    That is a quote from the abstract of the algorithm used. Not my words.
  • CodeChimp
    CodeChimp over 10 years
    That would be (2^128)-1 possible values, or 340,282,366,920,938,463,463,374,607,431,768,211,455. Still, it is "possible" for there to be a collision, but you would have a 1 in (2^128)-1 chance of that happening, baring the algorithm used was decent. That's not a very good chance.
  • Erick Robertson
    Erick Robertson over 10 years
    Still, I would not use the word "guarantee", and I would consider that abstract to be incorrect in that regard.
  • Nico
    Nico over 8 years
    And what a about a cluster environment ?
  • Thorn G
    Thorn G over 8 years
    The original question was in the context of identifiers for a REST api. If you're generating "tens of trillions" of sessions in your api, yes, then collisions might be a concern, but I'd wager you'll have other problems first.
  • Admin
    Admin about 8 years
    "the annual risk of someone being hit by a meteorite....." is not a guarantee => In 2016 one man has been killed by a meteorite
  • FelipeM
    FelipeM over 6 years
    What about generating uuid as the name for files that are being stored in a aws s3 cluster?
  • rgoers
    rgoers over 4 years
    Years ago I used a random uuid for the unique key in a database table. Within the first week I got a collision. Random UUIDs are not totally random and cannot guarantee that a collision will not occur.
  • searchengine27
    searchengine27 almost 4 years
    TLDR: UUID#randomUUID() generates a statistically unique UUID. Which is not the same thing as unique. Also, the copy and paste-ness of this answer is making my eyes bleed. Please format the math...