Create big integer from the big end of a uuid in PostgreSQL

10,610

Solution 1

This is all very shaky, both the problem and the solution you describe in your self-answer.

First, a mismatch between a database design and a third-party application is always possible, but usually indicative of a deeper problem. Why does your database use the uuid data type as a PK in the first place? They are not very efficient compared to a serial or a bigserial. Typically you would use a UUID if you are working in a distributed environment where you need to "guarantee" uniqueness over multiple installations.

Secondly, why does the application require the PK to begin with (incidentally: views do not have a PK, the underlying tables do)? If it is only to view the data then a PK is rather useless, particularly if it is based on a UUID (and there is thus no conceivable relationship between the PK and the rest of the tuple). If it is used to refer to other data in the same database or do updates or deletes of existing data, then you need the exact UUID and not some extract of it because the underlying table or other relations in your database would have the exact UUID. Of course you can convert all UUID's with the same hex_to_int() function, but that leads straight back to my point above: why use uuids in the first place?

Thirdly, do not mess around with things you have little or no knowledge of. This is not intended to be offensive, take it as well-meant advice (look around on the internet for programmers who tried to improve on cryptographic algorithms or random number generation by adding their own twists of obfuscation; quite entertaining reads). There are 5 algorithms for generating UUID's in the uuid-ossp package and while you know or can easily find out which algorithm is used in your database (the uuid_generate_vX() functions in your table definitions, most likely), do you know how the algorithm works? The claim of practical uniqueness of a UUID is based on its 128 bits, not a 64-bit extract of it. Are you certain that the high 64-bits are random? My guess is that 64 consecutive bits are less random than the "square root of the randomness" (for lack of a better way to phrase the theoretical drop in periodicity of a 64-bit number compared to a 128-bit number) of the full UUID. Why? Because all but one of the algorithms are made up of randomized blocks of otherwise non-random input (such as the MAC address of a network interface, which is always the same on a machine generating millions of UUIDs). Had 64 bits been enough for randomized value uniqueness, then a uuid would have been that long.

What a better solution would be in your case is hard to say, because it is unclear what the third-party application does with the data from your database and how dependent it is on the uniqueness of the "PK" column in the view. An approach that is likely to work if the application does more than trivially display the data without any further use of the "PK" would be to associate a bigint with every retrieved uuid in your database in a (temporary) table and include that bigint in your view by linking on the uuids in your (temporary) tables. Since you can not trigger on SELECT statements, you would need a function to generate the bigint for every uuid the application retrieves. On updates or deletes on the underlying tables of the view or upon selecting data from related tables, you look up the uuid corresponding to the bigint passed in from the application. The lookup table and function would look somewhat like this:

CREATE TEMPORARY TABLE temp_table(
    tempint bigserial PRIMARY KEY,
    internal_uuid uuid);
CREATE INDEX ON temp_table(internal_uuid);

CREATE FUNCTION temp_int_for_uuid(pk uuid) RETURNS bigint AS $$
DECLARE
    id    bigint;
BEGIN
    SELECT tempint INTO id FROM temp_table WHERE internal_uuid = pk;
    IF NOT FOUND THEN
        INSERT INTO temp_table(internal_uuid) VALUES (pk)
        RETURNING tempint INTO id;
    END IF;
    RETURN id;
END; $$ LANGUAGE plpgsql STRICT;

Not pretty, not efficient, but fool-proof.

Solution 2

Fast and without dynamic SQL

Cast the leading 16 hex digits of a UUID in text representation as bitstring bit(64) and cast that to bigint. See:

Conveniently, excess hex digits to the right are truncated in the cast to bit(64) automatically - exactly what we need.

Postgres accepts various formats for input. Your given string literal is one of them:

14607158d3b14ac0b0d82a9a5a9e8f6e

The default text representation of a UUID (and the text output in Postgres for data type uuid) adds hyphens at predefined places:

14607158-d3b1-4ac0-b0d8-2a9a5a9e8f6e

The manual:

A UUID is written as a sequence of lower-case hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits.

If input format can vary, strip hyphens first to be sure:

SELECT ('x' || translate(uuid_as_string, '-', ''))::bit(64)::bigint;

Cast actual uuid input with uuid::text.

db<>fiddle here

Note that Postgres uses signed integer, so the bigint overflows to negative numbers in the upper half - which should be irrelevant for this purpose.

DB design

If at all possible add a bigserial column to the underlying table and use that instead.

Solution 3

Use the bit() function to parse a decimal number from hex literal built from a substr of the UUID:

select ('x'||substr(UUID, 1, 16))::bit(64)::bigint

See SQLFiddle

Solution 4

Solution found.

UUID::text will return a string with hyphens. In order for substring(UUID::text from 1 for 16) to create a string that x can parse as hex the hyphens need to be stripped first.

The final query looks like:

SELECT hex_to_int(substring((select replace(id::text,'-','')) from 1 for 16))::bigint FROM table

The hext_to_int function needs to be able to handle a bigint, not just int. It looks like:

CREATE OR REPLACE FUNCTION hex_to_int(hexval character varying)
  RETURNS bigint AS
$BODY$
DECLARE
   result  bigint;
BEGIN
 EXECUTE 'SELECT x''' || hexval || '''::bigint' INTO result;
 RETURN result;
END;
$BODY$`
Share:
10,610
TimEsk
Author by

TimEsk

Updated on June 04, 2022

Comments

  • TimEsk
    TimEsk almost 2 years

    I have a third-party application connecting to a view in my PostgreSQL database. It requires the view to have a primary key but can't handle the UUID type (which is the primary key for the view). It also can't handle the UUID as the primary key if it is served as text from the view.

    What I'd like to do is convert the UUID to a number and use that as the primary key instead. However,

    SELECT x'14607158d3b14ac0b0d82a9a5a9e8f6e'::bigint
    

    Fails because the number is out of range.

    So instead, I want to use SQL to take the big end of the UUID and create an int8 / bigint. I should clarify that maintaining order is 'desirable' but I understand that some of the order will change by doing this.

    I tried:

    SELECT x(substring(UUID::text from 1 for 16))::bigint
    

    but the x operator for converting hex doesn't seem to like brackets. I abstracted it into a function but

    SELECT hex_to_int(substring(UUID::text from 1 for 16))::bigint
    

    still fails.

    How can I get a bigint from the 'big end' half of a UUID?

    • Craig Ringer
      Craig Ringer over 9 years
      but ... views can't have a primary key.
    • TimEsk
      TimEsk over 9 years
      We are doing an override in the third-party app to assign a primary key to the view.
  • Erwin Brandstetter
    Erwin Brandstetter over 9 years
    Good advice. While you aim for "fool-proof" you should also defend against race conditions. Either way, the solution would exhibit terrible performance ("not efficient" as you put it), the temp table has to be rebuilt for every session with single inserts. Should rather be a regular table or better yet, just add the serial to the underlying table and use it directly. No more function needed. Truly fool-proof and best performance.