Will UUID as primary key in PostgreSQL give bad index performance?

25,850

Solution 1

(I work on Heroku Postgres)

We use UUIDs as primary keys on a few systems and it works great.

I recommend you use the uuid-ossp extension, and even have postgres generate UUIDs for you:

heroku pg:psql
psql (9.1.4, server 9.1.6)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.

dcvgo3fvfmbl44=> CREATE EXTENSION "uuid-ossp"; 
CREATE EXTENSION  
dcvgo3fvfmbl44=> CREATE TABLE test (id uuid primary key default uuid_generate_v4(), name text);  
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE  
dcvgo3fvfmbl44=> \d test
                 Table "public.test"  
Column | Type |              Modifiers              
--------+------+-------------------------------------  
id     | uuid | not null default uuid_generate_v4()  name   | text |  
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)

dcvgo3fvfmbl44=> insert into test (name) values ('hgmnz'); 
INSERT 0 1 
dcvgo3fvfmbl44=> select * from test;
                  id                  | name  
--------------------------------------+-------   
 e535d271-91be-4291-832f-f7883a2d374f | hgmnz  
(1 row)

EDIT performance implications

It will always depend on your workload.

The integer primary key has the advantage of locality where like-data sits closer together. This can be helpful for eg: range type queries such as WHERE id between 1 and 10000 although lock contention is worse.

If your read workload is totally random in that you always make primary key lookups, there shouldn't be any measurable performance degradation: you only pay for the larger data type.

Do you write a lot to this table, and is this table very big? It's possible, although I haven't measured this, that there are implications in maintaining that index. For lots of datasets UUIDs are just fine though, and using UUIDs as identifiers has some nice properties.

Finally, I may not be the most qualified person to discuss or advice on this, as I have never run a table large enough with a UUID PK where it has become a problem. YMMV. (Having said that, I'd love to hear of people who run into problems with the approach!)

Solution 2

As the accepted answer states, range queries may be slow in this case, but not only on id.

Autoincrement is naturally sorted by date, so when autoincrement is used the data is stored chronologically on disk (see B-Tree) which speeds up reads (no seeking for HDDs). For example, if one lists all the users the natural order would be by date created which is the same as autoincrement and so range queries execute faster on HDDs while on SSD, i guess, the difference would be nonexistent since SSDs are by design always random access (no head seeking, no mechanical parts involved, just pure electricity)

Share:
25,850
thejaz
Author by

thejaz

Updated on July 05, 2022

Comments

  • thejaz
    thejaz almost 2 years

    I have created an app in Rails on Heroku using a PostgreSQL database.

    It has a couple of tables designed to be able to sync with mobile devices where data can be created on different places. Therefor I have a uuid field that is a string storing a GUID in addition to an auto increment primary key. The uuid is the one that is communicated between the server and the clients.

    I realised after implementing the sync engine on the server side that this leads to performance issues when needing to map between uuid<->id all the time (when writing objects, I need to query for the uuid to get the id before saving and the opposite when sending back data).

    I'm now thinking about switching to only using UUID as primary key making the writing and reading much simpler and faster.

    I've read that UUID as primary key can sometimes give bad index performance (index fragmentation) when using clustered primary key index. Does PostgreSQL suffer from this problem or is it OK to use UUID as primary key?

    I already have a UUID column today so storage wise it will be better because I drop the regular id column.

  • thejaz
    thejaz over 11 years
    Thanks for the response. I'm creating the UUIDs in the Rails models today, is that just as good as using the database function above?
  • hgmnz
    hgmnz over 11 years
    Yes, I see no problem with that as long as your rails models are the only thing inserting data into your database.
  • Noach Magedman
    Noach Magedman over 11 years
    @hgmnz: How do you define the id column to be a uuid via a Rails migration?
  • Michael Shimmins
    Michael Shimmins over 11 years
    @Noach if you're still after an answer on how to use UUID have a look at the postgres_ext gem (or Rails 4).
  • TaiwanGrapefruitTea
    TaiwanGrapefruitTea about 11 years
    @hgmnz Do you have any experience with UUID primary keys and large data sets? ie: a database where the primary key index is not resident memory (eg: the working set of the database is larger than memory). This reference says that they experience performance degradation in this situation: [UUID key bad for performance] (news.ycombinator.com/item?id=5310662)
  • TaiwanGrapefruitTea
    TaiwanGrapefruitTea about 11 years
    @hgmnz Also, it seems to matter if you use UUID v1 or v4 (used in your example). UUID v1 might be more indexable.
  • TaiwanGrapefruitTea
    TaiwanGrapefruitTea about 11 years
    @hgmnz Here's another comment about poor uuid key performance and it seems to be for uuid v1 (not v4) from poor performance uuid v1
  • hgmnz
    hgmnz about 11 years
    @TaiwanGrapfruitTea added some thoughts on performance implications
  • Michael
    Michael about 7 years
    I thought Postgres didn't have clustered indexes. How does "like-data sit closer" if the index is not clustered (i.e. sorted on disk)?