What are the performance improvement of Sequential Guid over standard Guid?

33,044

Solution 1

GUID vs.Sequential GUID



A typical pattern it's to use Guid as PK for tables, but, as referred in other discussions (see Advantages and disadvantages of GUID / UUID database keys) there are some performance issues.

This is a typical Guid sequence

f3818d69-2552-40b7-a403-01a6db4552f7
7ce31615-fafb-42c4-b317-40d21a6a3c60
94732fc7-768e-4cf2-9107-f0953f6795a5


Problems of this kind of data are:<
-

  • Wide distributions of values
  • Almost randomically ones
  • Index usage is very, very, very bad
  • A lot of leaf moving
  • Almost every PK need to be at least on a non clustered index
  • Problem happens both on Oracle and SQL Server



A possible solution is using Sequential Guid, that are generated as follows:

cc6466f7-1066-11dd-acb6-005056c00008
cc6466f8-1066-11dd-acb6-005056c00008
cc6466f9-1066-11dd-acb6-005056c00008


How to generate them From C# code:

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

public static Guid SequentialGuid()
{
    const int RPC_S_OK = 0;
    Guid g;
    if (UuidCreateSequential(out g) != RPC_S_OK)
        return Guid.NewGuid();
    else
        return g;
}


Benefits

  • Better usage of index
  • Allow usage of clustered keys (to be verified in NLB scenarios)
  • Less disk usage
  • 20-25% of performance increase at a minimum cost



Real life measurement: Scenario:

  • Guid stored as UniqueIdentifier types on SQL Server
  • Guid stored as CHAR(36) on Oracle
  • Lot of insert operations, batched together in a single transaction
  • From 1 to 100s of inserts depending on table
  • Some tables > 10 millions rows



Laboratory Test – SQL Server

VS2008 test, 10 concurrent users, no think time, benchmark process with 600 inserts in batch for leaf table
Standard Guid
Avg. Process duration: 10.5 sec
Avg. Request for second: 54.6
Avg. Resp. Time: 0.26

Sequential Guid
Avg. Process duration: 4.6 sec
Avg. Request for second: 87.1
Avg. Resp. Time: 0.12

Results on Oracle (sorry, different tool used for test) 1.327.613 insert on a table with a Guid PK

Standard Guid, 0.02 sec. elapsed time for each insert, 2.861 sec. of CPU time, total of 31.049 sec. elapsed

Sequential Guid, 0.00 sec. elapsed time for each insert, 1.142 sec. of CPU time, total of 3.667 sec. elapsed

The DB file sequential read wait time passed from 6.4 millions wait events for 62.415 seconds to 1.2 million wait events for 11.063 seconds.

It's important to see that all the sequential guid can be guessed, so it's not a good idea to use them if security is a concern, still using standard guid.
To make it short... if you use Guid as PK use sequential guid every time they are not passed back and forward from a UI, they will speed up operation and do not cost anything to implement.

Solution 2

I may be missing something here (feel free to correct me if I am), but I can see very little benefit in using sequential GUID/UUIDs for primary keys.

The point of using GUIDs or UUIDs over autoincrementing integers is:

  • They can be created anywhere without contacting the database
  • They are identifiers that are entirely unique within your application (and in the case of UUIDs, universally unique)
  • Given one identifier, there is no way to guess the next or previous (or even any other valid identifiers) outside of brute-forcing a huge keyspace.

Unfortunately, using your suggestion, you lose all those things.

So, yes. You've made GUIDs better. But in the process, you've thrown away almost all of the reasons to use them in the first place.

If you really want to improve performance, use a standard autoincrementing integer primary key. That provides all the benefits you described (and more) while being better than a 'sequential guid' in almost every way.

This will most likely get downmodded into oblivion as it doesn't specifically answer your question (which is apparently carefully-crafted so you could answer it yourself immediately), but I feel it's a far more important point to raise.

Solution 3

As massimogentilini already said, Performance can be improved when using UuidCreateSequential (when generating the guids in code). But a fact seems to be missing: The SQL Server (at least Microsoft SQL 2005 / 2008) uses the same functionality, BUT: the comparison/ordering of Guids differ in .NET and on the SQL Server, which would still cause more IO, because the guids will not be ordered correctly. In order to generate the guids ordered correctly for sql server (ordering), you have to do the following (see comparison details):

[System.Runtime.InteropServices.DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(byte[] buffer);

static Guid NewSequentialGuid() {

    byte[] raw = new byte[16];
    if (UuidCreateSequential(raw) != 0)
        throw new System.ComponentModel.Win32Exception(System.Runtime.InteropServices.Marshal.GetLastWin32Error());

    byte[] fix = new byte[16];

    // reverse 0..3
    fix[0x0] = raw[0x3];
    fix[0x1] = raw[0x2];
    fix[0x2] = raw[0x1];
    fix[0x3] = raw[0x0];

    // reverse 4 & 5
    fix[0x4] = raw[0x5];
    fix[0x5] = raw[0x4];

    // reverse 6 & 7
    fix[0x6] = raw[0x7];
    fix[0x7] = raw[0x6];

    // all other are unchanged
    fix[0x8] = raw[0x8];
    fix[0x9] = raw[0x9];
    fix[0xA] = raw[0xA];
    fix[0xB] = raw[0xB];
    fix[0xC] = raw[0xC];
    fix[0xD] = raw[0xD];
    fix[0xE] = raw[0xE];
    fix[0xF] = raw[0xF];

    return new Guid(fix);
}

or this link or this link.

Solution 4

See This article: (http://www.shirmanov.com/2010/05/generating-newsequentialid-compatible.html)

Even though MSSql uses this same function to generate NewSequencialIds ( UuidCreateSequential(out Guid guid) ), MSSQL reverses the 3rd and 4th byte patterns which does not give you the same result that you would get when using this function in your code. Shirmanov shows how to get the exact same results that MSSQL would create.

Solution 5

I messured difference between Guid (clustered and non clustered), Sequential Guid and int (Identity/autoincrement) using Entity Framework. The Sequential Guid was surprisingly fast compared to the int with identity. Results and code of the Sequential Guid here.

Share:
33,044
massimogentilini
Author by

massimogentilini

Developer and manager for .Net and Java projects

Updated on February 12, 2021

Comments

  • massimogentilini
    massimogentilini about 3 years

    Has someone ever measured performance of Sequential Guid vs. Standard Guid when used as Primary Keys inside a database?


    I do not see the need for unique keys to be guessable or not, passing them from a web UI or in some other part seems a bad practice by itself and I do not see, if you have security concerns, how using a guid can improve things (if this is the matter use a real random number generator using the proper crypto functions of the framework).
    The other items are covered by my approach, a sequential guid can be generated from code without need for DB access (also if only for Windows) and it's unique in time and space.
    And yes, question was posed with the intent of answering it, to give people that have choosen Guids for their PK a way to improve database usage (in my case has allowed the customers to sustain a much higher workload without having to change servers).

    It seems that security concerns are a lot, in this case do not use Sequential Guid or, better still, use standard Guid for PK that are passed back and forward from your UI and sequential guid for everything else. As always there is no absolute truth, I've edited also main answer to reflect this.

  • massimogentilini
    massimogentilini over 15 years
    Beside the "non guessing" (that I do not consider important, we're not looking to a randomize function) the sequential guid have exactly the characteristic you're looking for, I generate them from C# code and they are unique in time and space.
  • massimogentilini
    massimogentilini over 15 years
    I repeat, I do not see Guid used for generate keys that cannot be guesses but as a way to have keys that are unique in time and space and can be easily used for replication, if privacy is important use other approach (real random numbers)
  • Alex S
    Alex S over 15 years
    I am a bit skeptical of COMBs and similar techniques, because "GUIDs are globally unique, but substrings of GUIDs aren't": blogs.msdn.com/oldnewthing/archive/2008/06/27/8659071.aspx
  • Mitch Wheat
    Mitch Wheat over 15 years
    GUIDs are statistically unique. That is, the chances of a collision is very small. A COMB sacrifices a few bits of the 128 available in a GUID. So yes, the chances of a collision are higher, but still extremely low.
  • massimogentilini
    massimogentilini almost 15 years
    Great point. From what I can get using your code performance could improve some more, sooner or later I'll do some test
  • Admin
    Admin over 14 years
    sequential UUIDs don't guarantee a global ordering. They are still universally unique, but they are also locally sequential. This means that IDs generated on different hosts/processes/threads (depending on the sequential scheme) interleave randomly, but IDs generated in the same environment will be ordered.
  • Thomas
    Thomas over 13 years
    The whole point with guids is that they have a higher probability of global uniqueness than an integer. That probability does not have to be 100%. While using a COMB guid increases the probability of a collision, it is still many orders of magnitude lower than using an identity column.
  • bernhof
    bernhof about 13 years
  • bbqchickenrobot
    bbqchickenrobot almost 13 years
    COMB GUIDs are ordered and are very fast for inserts/reads and provide comparable speeds to identity columns. All the percs of an identity column but you don't need to use any crazy replication strategies with a GUID. Identity column you do. Advantage GUID.
  • hgoebl
    hgoebl over 10 years
    With storage engine 'InnoDB', MySQL is storing records by PK in a clustered way, so here you should benefit from Sequential GUIDs as well.
  • GoYun.Info
    GoYun.Info almost 10 years
    If it is on the cloud, a standard autoincrementing integer primary key is not good for the long run.
  • Raven
    Raven almost 10 years
    "It's important to see that all the sequential guid can be guessed, so it's not a good idea to use them if security is a concern" in this case a Comb guid could be used instead which has the benefit of being sequential and random.
  • Giorgi Chakhidze
    Giorgi Chakhidze almost 10 years
    See this blog post: blogs.msdn.com/b/dbrowne/archive/2012/07/03/… "... results of UuidCreateSequential are not sequential with respect to SQL Server's sort order... To make them sequential SQL Server's internal NEWSEQUENTIALID function performs some byte shuffling on the GUID... you need to perform the same byte shuffling"
  • johnny
    johnny over 6 years
    Why is it better is what I don't understand.
  • GoYun.Info
    GoYun.Info over 6 years
    It is not unique across tables. Cloud is for web scale. Unless your db is very small then it doesn't matter.
  • entonio
    entonio about 4 years
    What's the purpose at all of having sequential guids instead of having a sequential integer?
  • Frank Hopkins
    Frank Hopkins over 3 years
    @entonio you can still generate them in a distributed way (depending on how you go about making them sequential^^)
  • trees_are_great
    trees_are_great about 3 years
    Results not found. I would be interested in how you measured the difference. The problem with a standard guids, which are often used, would be page splits on inserts, which would slowly cause query performance to degrade. Did you do the inserts in such a way to cause page splits?
  • Alex Siepman
    Alex Siepman about 3 years
    The URL has updated so you can see the results.
  • trees_are_great
    trees_are_great about 3 years
    Thanks. A very interesting analysis. It would be great to do something like that, but then query to see how fragmented each table is. And then compare a query on a highly fragmented Guid table compared with a non unique int table. I'm currently in the process of switching Guids to COMB Guids in the hope that that will speed up query performance.