How to generate a UUIDv4 in MySQL?

24,087

Solution 1

I've spent quite some time looking for a solution and came up with the following mysql function that generates a random UUID (i.e. UUIDv4) using standard MySQL functions. I'm answering my own question to share that in the hope that it'll be useful.

-- Change delimiter so that the function body doesn't end the function declaration
DELIMITER //

CREATE FUNCTION uuid_v4()
    RETURNS CHAR(36) NO SQL
BEGIN
    -- Generate 8 2-byte strings that we will combine into a UUIDv4
    SET @h1 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h2 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h3 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h6 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h7 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h8 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');

    -- 4th section will start with a 4 indicating the version
    SET @h4 = CONCAT('4', LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'));

    -- 5th section first half-byte can only be 8, 9 A or B
    SET @h5 = CONCAT(HEX(FLOOR(RAND() * 4 + 8)),
                LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'));

    -- Build the complete UUID
    RETURN LOWER(CONCAT(
        @h1, @h2, '-', @h3, '-', @h4, '-', @h5, '-', @h6, @h7, @h8
    ));
END
//
-- Switch back the delimiter
DELIMITER ;

Note: The pseudo-random number generation used (MySQL's RAND) is not cryptographically secure and thus has some bias which can increase the collision risk.

Solution 2

In the off chance you're working with a DB and don't have perms to create functions, here's the same version as above that works just as a SQL expression:

SELECT LOWER(CONCAT(
    LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'), 
    LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'), '-',
    LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'), '-', 
    '4',
    LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'), '-', 
    HEX(FLOOR(RAND() * 4 + 8)), 
    LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'), '-', 
    LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'),
    LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'),
    LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0')));

Solution 3

Both existing answers relies on MySQL RAND() function:

RAND() is not meant to be a perfect random generator. It is a fast way to generate random numbers on demand that is portable between platforms for the same MySQL version.

In the practice, this mean that the generated UUID using this function might (and will) be biased, and collisions can occur more frequently then expected.

Solution

It's possible to generate safe UUID V4 on MySQL side using random_bytes() function:

This function returns a binary string of len random bytes generated using the random number generator of the SSL library.

So we can update the function to:

CREATE FUNCTION uuid_v4s()
    RETURNS CHAR(36)
BEGIN
    -- 1th and 2nd block are made of 6 random bytes
    SET @h1 = HEX(RANDOM_BYTES(4));
    SET @h2 = HEX(RANDOM_BYTES(2));

    -- 3th block will start with a 4 indicating the version, remaining is random
    SET @h3 = SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3);

    -- 4th block first nibble can only be 8, 9 A or B, remaining is random
    SET @h4 = CONCAT(HEX(FLOOR(ASCII(RANDOM_BYTES(1)) / 64)+8),
                SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3));

    -- 5th block is made of 6 random bytes
    SET @h5 = HEX(RANDOM_BYTES(6));

    -- Build the complete UUID
    RETURN LOWER(CONCAT(
        @h1, '-', @h2, '-4', @h3, '-', @h4, '-', @h5
    ));
END

This should generate UUID V4 random enough to don't care about collisions.

NOTE: Unfortunately MariaDB doesn't support RANDOM_BYTES() (See https://mariadb.com/kb/en/function-differences-between-mariadb-105-and-mysql-80/#miscellaneous)

Test

I've created following test scenario: Insert random UUID v4 as primary key for a table until 40.000.000 rows are created. When a collision is found, the row is updated incrementing collisions column:

INSERT INTO test (uuid) VALUES (uuid_v4()) ON DUPLICATE KEY UPDATE collisions=collisions+1;

The sum of collisions after 40 million rows with each function is:

+----------+----------------+
| RAND()   | RANDOM_BYTES() |
+----------+----------------+
|       55 |              0 |
+----------+----------------+

The number collisions in both scenarios tends to increase as number of rows grows.

Solution 4

Adaptation of Elias Soares's answer using RANDOM_BYTES without creating a DB function:

SELECT LOWER(CONCAT(
    HEX(RANDOM_BYTES(4)), '-',
    HEX(RANDOM_BYTES(2)), '-4',
    SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3), '-',
    CONCAT(HEX(FLOOR(ASCII(RANDOM_BYTES(1)) / 64)+8),SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3)), '-',
    HEX(RANDOM_BYTES(6))
))
Share:
24,087
svvac
Author by

svvac

Updated on March 30, 2021

Comments

  • svvac
    svvac about 3 years

    MySQL's UUID function returns a UUIDv1 GUID. I'm looking for an easy way to generate random GUIDs (i.e. UUIDv4) in SQL.

  • Elias Soares
    Elias Soares about 4 years
    Please note that both answers relies on MySQL RAND() that is NOT cryptographically-safe random number generator. Collisions are expected when using this method.
  • Rick James
    Rick James about 4 years
    A side note: RAND() repeats after 2^30 calls. See stackoverflow.com/a/58459869/1766831
  • svvac
    svvac about 4 years
    Thanks for the cryptographically secure version ; sadly RANDOM_BYTES is not available in MariaDB (10.5).
  • Elias Soares
    Elias Soares about 4 years
    I'll update the answer to contain this note. I really don't understand why MariaDB removed it.
  • Vérace
    Vérace almost 4 years
    @RickJames - is the sequence identical for all machines or is it machine-dependent? 3.1^13 is the number of microseconds in a year, which would imply that if you generated 1 UUID every microsecond, it would take ~ 2^17 years to exhaust RAND() (if my maths is correct - I've approximated 3.1 ~ 2 for this "back of coaster" calculation). This is over a million times the current estimated age of the universe - how relevant is this number in real life?
  • Vérace
    Vérace almost 4 years
    @RickJames However, RAND() generates a 16 digit decimal fraction - so it would seem that it has to repeat values every 1000 years or so - but, that's not the same as repeating the values in a sequence - but again, relevance comes into play - no?
  • Rick James
    Rick James almost 4 years
    @Vérace - RAND() is providing that 30-bit value in a DOUBLE, which is displayed to 16 decimal places. The first ~9 digits won't repeat; the last seven is noise due to converting from binary to decimal.
  • Alvin Thompson
    Alvin Thompson over 3 years
    Great answer, but there are two minor issues: (1) Since this is a non-deterministic function, later versions of MySQL may throw an error when you try to define it because it fails a security check. You need to add NO SQL after the RETURNS clause to tell MySQL that this function doesn't contain any SQL statements. (2) You're missing the final trailing semicolon. :)
  • Alvin Thompson
    Alvin Thompson over 3 years
    Since this is a non-deterministic function, later versions of MySQL may throw an error when you try to define it because it fails a security check. You need to add NO SQL after the RETURNS clause to tell MySQL that this function doesn't contain any SQL statements.
  • Norman Breau
    Norman Breau over 3 years
    @AlvinThompson I don't think NO SQL is correct here, based on MySQL Docs example: "CONTAINS SQL indicates that the routine does not contain statements that read or write data. This is the default if none of these characteristics is given explicitly. Examples of such statements are SET @x = 1 or DO RELEASE_LOCK('abc'), which execute but neither read nor write data." The function does contain sql statements, but statements neither reads or writes data. The SQL statements are indeed not deterministic however which may affect replication based on your binlog_mode.
  • Alvin Thompson
    Alvin Thompson over 3 years
    @NormanBreau I read that as well, but my vanilla install of mysql (8.0.22) won't allow me to add the function without adding some characteristics. Explicitly adding CONTAINS SQL also doesn't work, giving the same error. The only remaining choices are DETERMINISTIC, which clearly would be incorrect here, READS SQL DATA, which is also inappropriate because this function doesn't read data, or NO SQL. I chose NO SQL because it seemed the most appropriate of the options. Is there another option I missed?
  • Norman Breau
    Norman Breau over 3 years
    No, I think you went through all available options... That is strange though, I'm also on 8.0.22, and I was able to create the function with the default characteristics... which is CONTAINS SQL and NOT DETERMINISTIC. But I didn't end up using this cause unfortunately you can't use user-defined functions in a column's DEFAULT expression...
  • Alvin Thompson
    Alvin Thompson over 3 years
    Yep, I had to create a "before insert" trigger. This is the error I get: Query 1 ERROR: This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable).
  • Alvin Thompson
    Alvin Thompson over 3 years
    @NormanBreau in case you're interested, my trigger is pretty simple but seems to work: SET new.token = COALESCE(new.token, uuid4()).
  • Elias Soares
    Elias Soares over 3 years
    Did you tried with non deterministic? Asking because I have a vague memory about having issue with this in the past, not sure if it was with this function.
  • Stalinko
    Stalinko about 2 years
    Every next answer in this thread is better and better. And your version is an ideal match for me) thanks
  • Andrew
    Andrew about 2 years
    A thing of beauty - thanks!