How to generate a UUIDv4 in MySQL?
Solution 1
I've spent quite some time looking for a solution and came up with the following mysql function that generates a random UUID (i.e. UUIDv4) using standard MySQL functions. I'm answering my own question to share that in the hope that it'll be useful.
-- Change delimiter so that the function body doesn't end the function declaration
DELIMITER //
CREATE FUNCTION uuid_v4()
RETURNS CHAR(36) NO SQL
BEGIN
-- Generate 8 2-byte strings that we will combine into a UUIDv4
SET @h1 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
SET @h2 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
SET @h3 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
SET @h6 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
SET @h7 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
SET @h8 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
-- 4th section will start with a 4 indicating the version
SET @h4 = CONCAT('4', LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'));
-- 5th section first half-byte can only be 8, 9 A or B
SET @h5 = CONCAT(HEX(FLOOR(RAND() * 4 + 8)),
LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'));
-- Build the complete UUID
RETURN LOWER(CONCAT(
@h1, @h2, '-', @h3, '-', @h4, '-', @h5, '-', @h6, @h7, @h8
));
END
//
-- Switch back the delimiter
DELIMITER ;
Note: The pseudo-random number generation used (MySQL's RAND
) is not
cryptographically secure and thus has some bias which can increase the collision
risk.
Solution 2
In the off chance you're working with a DB and don't have perms to create functions, here's the same version as above that works just as a SQL expression:
SELECT LOWER(CONCAT(
LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'),
LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'), '-',
LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'), '-',
'4',
LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'), '-',
HEX(FLOOR(RAND() * 4 + 8)),
LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'), '-',
LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'),
LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0'),
LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0')));
Solution 3
Both existing answers relies on MySQL RAND()
function:
RAND() is not meant to be a perfect random generator. It is a fast way to generate random numbers on demand that is portable between platforms for the same MySQL version.
In the practice, this mean that the generated UUID
using this function might (and will) be biased, and collisions can occur more frequently then expected.
Solution
It's possible to generate safe UUID V4 on MySQL side using random_bytes()
function:
This function returns a binary string of len random bytes generated using the random number generator of the SSL library.
So we can update the function to:
CREATE FUNCTION uuid_v4s()
RETURNS CHAR(36)
BEGIN
-- 1th and 2nd block are made of 6 random bytes
SET @h1 = HEX(RANDOM_BYTES(4));
SET @h2 = HEX(RANDOM_BYTES(2));
-- 3th block will start with a 4 indicating the version, remaining is random
SET @h3 = SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3);
-- 4th block first nibble can only be 8, 9 A or B, remaining is random
SET @h4 = CONCAT(HEX(FLOOR(ASCII(RANDOM_BYTES(1)) / 64)+8),
SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3));
-- 5th block is made of 6 random bytes
SET @h5 = HEX(RANDOM_BYTES(6));
-- Build the complete UUID
RETURN LOWER(CONCAT(
@h1, '-', @h2, '-4', @h3, '-', @h4, '-', @h5
));
END
This should generate UUID V4 random enough to don't care about collisions.
NOTE: Unfortunately MariaDB doesn't support RANDOM_BYTES()
(See https://mariadb.com/kb/en/function-differences-between-mariadb-105-and-mysql-80/#miscellaneous)
Test
I've created following test scenario: Insert random UUID v4 as primary key for a table until 40.000.000 rows are created. When a collision is found, the row is updated incrementing collisions
column:
INSERT INTO test (uuid) VALUES (uuid_v4()) ON DUPLICATE KEY UPDATE collisions=collisions+1;
The sum of collisions after 40 million rows with each function is:
+----------+----------------+
| RAND() | RANDOM_BYTES() |
+----------+----------------+
| 55 | 0 |
+----------+----------------+
The number collisions in both scenarios tends to increase as number of rows grows.
Solution 4
Adaptation of Elias Soares's answer using RANDOM_BYTES
without creating a DB function:
SELECT LOWER(CONCAT(
HEX(RANDOM_BYTES(4)), '-',
HEX(RANDOM_BYTES(2)), '-4',
SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3), '-',
CONCAT(HEX(FLOOR(ASCII(RANDOM_BYTES(1)) / 64)+8),SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3)), '-',
HEX(RANDOM_BYTES(6))
))
svvac
Updated on March 30, 2021Comments
-
svvac about 3 years
MySQL's
UUID
function returns a UUIDv1 GUID. I'm looking for an easy way to generate random GUIDs (i.e. UUIDv4) in SQL. -
Elias Soares about 4 yearsPlease note that both answers relies on MySQL
RAND()
that is NOT cryptographically-safe random number generator. Collisions are expected when using this method. -
Rick James about 4 yearsA side note:
RAND()
repeats after 2^30 calls. See stackoverflow.com/a/58459869/1766831 -
svvac about 4 yearsThanks for the cryptographically secure version ; sadly
RANDOM_BYTES
is not available in MariaDB (10.5). -
Elias Soares about 4 yearsI'll update the answer to contain this note. I really don't understand why MariaDB removed it.
-
Vérace almost 4 years@RickJames - is the sequence identical for all machines or is it machine-dependent? 3.1^13 is the number of microseconds in a year, which would imply that if you generated 1 UUID every microsecond, it would take ~ 2^17 years to exhaust RAND() (if my maths is correct - I've approximated 3.1 ~ 2 for this "back of coaster" calculation). This is over a million times the current estimated age of the universe - how relevant is this number in real life?
-
Vérace almost 4 years@RickJames However, RAND() generates a 16 digit decimal fraction - so it would seem that it has to repeat values every 1000 years or so - but, that's not the same as repeating the values in a sequence - but again, relevance comes into play - no?
-
Rick James almost 4 years@Vérace -
RAND()
is providing that 30-bit value in aDOUBLE
, which is displayed to 16 decimal places. The first ~9 digits won't repeat; the last seven is noise due to converting from binary to decimal. -
Alvin Thompson over 3 yearsGreat answer, but there are two minor issues: (1) Since this is a non-deterministic function, later versions of MySQL may throw an error when you try to define it because it fails a security check. You need to add
NO SQL
after theRETURNS
clause to tell MySQL that this function doesn't contain any SQL statements. (2) You're missing the final trailing semicolon. :) -
Alvin Thompson over 3 yearsSince this is a non-deterministic function, later versions of MySQL may throw an error when you try to define it because it fails a security check. You need to add NO SQL after the RETURNS clause to tell MySQL that this function doesn't contain any SQL statements.
-
Norman Breau over 3 years@AlvinThompson I don't think
NO SQL
is correct here, based on MySQL Docs example: "CONTAINS SQL indicates that the routine does not contain statements that read or write data. This is the default if none of these characteristics is given explicitly. Examples of such statements are SET @x = 1 or DO RELEASE_LOCK('abc'), which execute but neither read nor write data." The function does contain sql statements, but statements neither reads or writes data. The SQL statements are indeed not deterministic however which may affect replication based on your binlog_mode. -
Alvin Thompson over 3 years@NormanBreau I read that as well, but my vanilla install of mysql (8.0.22) won't allow me to add the function without adding some characteristics. Explicitly adding
CONTAINS SQL
also doesn't work, giving the same error. The only remaining choices areDETERMINISTIC
, which clearly would be incorrect here,READS SQL DATA
, which is also inappropriate because this function doesn't read data, orNO SQL
. I choseNO SQL
because it seemed the most appropriate of the options. Is there another option I missed? -
Norman Breau over 3 yearsNo, I think you went through all available options... That is strange though, I'm also on 8.0.22, and I was able to create the function with the default characteristics... which is
CONTAINS SQL
andNOT DETERMINISTIC
. But I didn't end up using this cause unfortunately you can't use user-defined functions in a column'sDEFAULT
expression... -
Alvin Thompson over 3 yearsYep, I had to create a "before insert" trigger. This is the error I get:
Query 1 ERROR: This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable)
. -
Alvin Thompson over 3 years@NormanBreau in case you're interested, my trigger is pretty simple but seems to work:
SET new.token = COALESCE(new.token, uuid4())
. -
Elias Soares over 3 yearsDid you tried with non deterministic? Asking because I have a vague memory about having issue with this in the past, not sure if it was with this function.
-
Stalinko about 2 yearsEvery next answer in this thread is better and better. And your version is an ideal match for me) thanks
-
Andrew about 2 yearsA thing of beauty - thanks!