The differences between INT and UUID in MySQL

30,460

Solution 1

UUID returns a universal unique identifier (hopefuly also unique if imported to another DB as well).

To quote from MySQL doc (emphasis mine):

A UUID is designed as a number that is globally unique in space and time. Two calls to UUID() are expected to generate two different values, even if these calls are performed on two separate computers that are not connected to each other.

On the other hand a simply INT primary id key (e.g. AUTO_INCREMENT) will return a unique integer for the specific DB and DB table, but which is not universally unique (so if imported to another DB chances are there will be primary key conflicts).

In terms of performance, there shouldn't be any noticeable difference using auto-increment over UUID. Most posts (including some by the authors of this site), state as such. Of course UUID may take a little more time (and space), but this is not a performance bottleneck for most (if not all) cases. Having a column as Primary Key should make both choices equal wrt to performance. See references below:

  1. To UUID or not to UUID?
  2. Myths, GUID vs Autoincrement
  3. Performance: UUID vs auto-increment in cakephp-mysql
  4. UUID performance in MySQL?
  5. Primary Keys: IDs versus GUIDs (coding horror)

(UUID vs auto-increment performance results, adapted from Myths, GUID vs Autoincrement)

enter image description here

UUID pros / cons (adapted from Primary Keys: IDs versus GUIDs)

GUID Pros

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • Allows easy distribution of databases across multiple servers
  • You can generate IDs anywhere, instead of having to roundtrip to the database
  • Most replication scenarios require GUID columns anyway

GUID Cons

  • It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
  • Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
  • The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes.

Note

I would read carefully the mentioned references and decide whether to use UUID or not depending on my use case. That said, in many cases UUIDs would be indeed preferable. For example one can generate UUIDs without using/accessing the database at all, or even use UUIDs which have been pre-computed and/or stored somewhere else. Plus you can easily generalise/update your database schema and/or clustering scheme without having to worry about IDs breaking and causing conflicts.

In terms of possible collisions, for example using v4 UUIDS (random), the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion.

Solution 2

A UUID key cannot be pk until unless persisted in DB so round tripping will happen until then you cannot assume its pk without a successful transaction. Most of the UUID use time based, mac based, name based or some random uuid. Given we are moving heavily towards container based deployments and they have a pattern for starting sequence MAC addresses relying on mac addresses will not work. Time based is not going to guarantee as the assumption is systems are always in exact time sync which is not true sometimes as clocks will not follow the rules. GUID cannot guarantee that collision will never occur just that in given short period of time it will not occur but given enough time and systems running in parallel and proliferations of systems that guarantee will eventually fail.

http://www.ietf.org/rfc/rfc4122.txt

Share:
30,460
孙为强
Author by

孙为强

Updated on November 23, 2021

Comments

  • 孙为强
    孙为强 over 2 years

    If I set the primary key to be INT type (AUTO_INCREMENT) or set it in UUID, what is the difference between these two in the database performance (SELECT, INSERT etc) and why?