GUID vs INT IDENTITY

57,840

Solution 1

Kimberley Tripp (SQLSkills.com) has an article on using GUID's as primary keys. She advices against it because of the unnecessary overhead.

Solution 2

To answer your question: In the end, in what circumstances would you see yourself using an INT as a PK versus a GUID?

I would use a GUID if my system would have an online/offline version that inside the offline version you can save data and that data is transferred back to the server one day during a synch. That way, you are sure that you won't have the same key twice inside your database.

Solution 3

We have Guids in our very complex enterprise software everywhere. Works smoothly.

I believe Guids are semantically more suitable to serve as identifiers. There is also no point in unnecessarily worrying about performance until you are faced with that problem. Beware premature optimization.

There is also an advantage with database migration of any sort. With Guids you will have no collisions. If you attempt to merge several DBs where ints are used for identity, you will have to replace their values. If these old values were used in urls, they will now be different following SEO hit.

Solution 4

Apart from being a poor choice when you need to synchronize several database instances, INT's have one drawback I haven't seen mentioned: inserts always occur at one end of the index tree. This increases lock contention when you have a table with a lot of movement (since the same index pages have to be modified by concurrent inserts, whereas GUID's will be inserted all over the index). The index may also have to be rebalanced more often if a B* tree or similar data structure is used.

Of course, int's are easier on the eye when doing manual queries and report construction, and space consumption may add up through FK usages.

I'd be interested to see any measurements of how well e.g. SQL Server actually handles insert-heavy tables with IDENTITY PK's.

Solution 5

the INT is a space saver (though this point is generally moot in most modern systems).

Not so. It may seem so at first glance, but note that the primary key of each table will be repeated multiple times throughout the database in indexes and as foreign key in other tables. And it will be involved in nearly any query containing its table - and very intensively when it's a foreign key used for a join.

Furthermore, remember that modern CPUs are very, very fast, but RAM speeds have not kept up. Cache behaviour becomes therefore increasingly important. And the best way to get good cache behaviour is to have smaller data sets. So the seemingly irrelevant difference between 4 and 16 bytes may well result in a noticeable difference in speed. Not necessarily always - but it's something to consider.

Share:
57,840
CodeMonkey1313
Author by

CodeMonkey1313

Just another code junkie. Specialties include .NET (emphasis on ASP.NET), databases (emphasis on SQL Server, but experience with Oracle and MongoDB), and WCF. Beginning to get familiarity with ATG and Java development. SOreadytohelp

Updated on July 09, 2022

Comments

  • CodeMonkey1313
    CodeMonkey1313 almost 2 years

    Possible Duplicate:
    How do you like your primary keys?

    I'm aware of the benefits of using a GUID, as well as the benefits of using and INT as a PK in a database. Considering that a GUID is in essence a 128 bit INT and a normal INT is 32 bit, the INT is a space saver (though this point is generally moot in most modern systems).

    In the end, in what circumstances would you see yourself using an INT as a PK versus a GUID?

  • bignose
    bignose almost 15 years
    What if the natural key of the record is not numeric; e.g. (host, timestamp) for a log message record, or (product_code) for a product record? Would you insist on adding a numeric field serving no purpose except to have a redundant key?
  • Jim Arnold
    Jim Arnold almost 15 years
    GUIDs these days are usually randomly generated
  • glagarto
    glagarto almost 15 years
    No I wouldn't, but for a timestamp field you could consider adding a Identity Field to the table and use that as the key instead of the timestamp. As they are both generated by the DB. If its a product code then I would always use that for the ID as that is product specific based on your business so it makes no sence to change it to an ID. It all depends on the type of data you will be storing and how you will go about designing your database.
  • Ronald Wildenberg
    Ronald Wildenberg over 12 years
    @Marco Can you provide some reference to documentation that backs this up? I have never heard of this.
  • Marco van de Voort
    Marco van de Voort over 12 years
    This is already age old news. See among others simply the wikipedia en.wikipedia.org/wiki/Globally_unique_identifier most notably the algorithm section
  • Martin Smith
    Martin Smith over 12 years
    Still haven't read this series but I think Tony Rogerson is arguing that with SSDs the fragmentation issue is much reduced
  • Koste
    Koste almost 11 years
    What about guid clustering in your enterprise software?
  • IDIR Samir
    IDIR Samir about 8 years
    You can always use hashids to mitigate that problem hashids.org
  • Quarkly
    Quarkly over 5 years
    What you see in the debugger has to be the least useful criteria for choosing a data type. You might as well use strings to join your tables then.
  • Quarkly
    Quarkly over 5 years
    One of the better reasons I've seen for using an int, the extra 4 bytes are multiplied for every index and table they appear as a foreign key. However, I think the fragmentation issues would dwarf the actual space used by the key.
  • Quarkly
    Quarkly over 5 years
    Yes, absolutely. Ints, or any monotonically increasing value is going to create hot pages during inserts and updates. If you've got dozens of users all inserting and updating create contention for pages. Clustering may save space, but it creates bottlenecks for the hot pages.
  • Quarkly
    Quarkly over 5 years
    I absolutely agree with this reply, your instinct about how an interpreted language like SQL works is probably wrong; don't optimize until you've got a solid test that will tell you the difference. As far as clustering: it creates contention and hot pages, especially when you have several users all trying to write to the same page. Random keys will spread the data out over many pages and reduce contention.
  • jmathew
    jmathew almost 5 years
    The argument in the blog post, unless I misread, is primarily an INT / BIGINT system with a DB that uses clustered primary keys (SQL Server) will be faster and more space efficient. That is not true in all databases. Postgres for example.