UUID Primary Key in Postgres, What Insert Performance Impact?

database performance postgresql indexing uuid

20,579

Solution 1

As I understand it, Postgres does not maintain row clustering on inserts

Correct at the moment. Unfortunately.

so I imagine that in Postgres using a UUID PK does not hurt the performance of that insert.

It still does have a performance cost because of the need to maintain the PK, and because the inserted tuple is bigger.

The uuid is 4 times as wide as a typical 32-bit integer synthetic key, so the row to write is 12 bytes bigger and you can fit fewer rows into a given amount of RAM
The b-tree index that implements the primary key will be 4x as large (vs a 32-bit key), taking longer to search and requiring more memory to cache. It also needs more frequent page splits.
Writes will tend to be random within indexes, not appends to hot, recently accessed rows

is there any way to remedy [the performance impact on the index] or are UUIDs simply not a good PK on a large, un-partitioned table?

If you need a UUID key, you need a UUID key. You shouldn't use one if you don't require one, but if you cannot rely on a central source of synthetic keys and there is no suitable natural key to use, it's still the way to go.

Partitioning won't help much unless you can confine writes to one partition. Also, you won't be able to usefully use constraint exclusion on searches for the key if writing only to one partition at a time, so you'll still have to search all the partitions' indexes for a key when doing queries. I can only see it being useful if your UUID forms part of a composite key and you can partition on the other part of the composite key.

Solution 2

It should be mentioned that you will get more WALs generated if you have btree index on UUID column with full_page_writes option enabled. This happens because of UUID randomness - the values are not sequential so each insert is likely to touch completely new leaf index leaf page. You can read more in On the impact of full-page writes article.

20,579

philofinfinitejest

Updated on July 09, 2022

Comments

philofinfinitejest almost 2 years

I am wondering about the performance impact of using a non-sequential UUID as the primary key in a table that will become quite large in PosgreSQL.

In DBMS's that use clustered storage for table records it is a given that using a UUID is going to increase the cost of inserts due to having to read from disk to find the data page into which to perform the insert, once the table is too big to hold in memory. As I understand it, Postgres does not maintain row clustering on inserts, so I imagine that in Postgres using a UUID PK does not hurt the performance of that insert.

But I would think that it makes the insert into the index that the primary key constraint creates much more expensive once the table is large, because it will have to constantly be read from disk to update the index on insertion of new data. Whereas with a sequential key the index will only be updated at the tip which will always be in memory.

Assuming that I understand the performance impact on the index correctly, is there any way to remedy that or are UUIDs simply not a good PK on a large, un-partitioned table?
- DharmaTurtle over 3 years
  
  You specify "non-sequential UUID" in the first sentence but then ask "are UUIDs simply not a good PK". It is possible to generate lexicographically sortable ULIDs if you're willing to abandon the RFC 4122 spec. This "sequential UUID" solves some problems described in the answers below.
a_horse_with_no_name over 8 years

Why is it "unfortunate" that Postgres doesn't store the data clustered? I have never understood why that would be the default way of storing a relational table. There are some cases where it is helpful, but as a general default I don't buy it. But then maybe I have been working too long with Oracle where index organized tables (Oracle's equivalent to a clustered index) are the exception.
Craig Ringer over 8 years

I don't think it should be the default; heaps often have better properties. But index-organized tables are great for read-mostly tables that're joined on a lot. It's a pain CLUSTERing a table all the time. You can still create big fat indexes for use with index-only scans, but that's a waste of disk and I/O. HOT updates in PostgreSQL help, albeit at the price of needing wasted space in each page to use to write updated tuples. BRIN indexes would benefit greatly from attempts to direct writes near similar values, and will hopefully also make it easier to implement such a feature.
Amreesh Tyagi over 3 years

what about twitterflake, sonyflake or goldflake IDs? I would rather use them in my DBs.github.com/AmreeshTyagi/goldflake
Feuermurmel over 3 years

Would this be remedied by using sequential UUIDs (i.e. starting at a random UUID and counting up), assuming there is only a single or few writers?