Should I get rid of clustered indexes on Guid columns

23,213

Solution 1

A big reason for a clustered index is when you often want to retrieve rows for a range of values for a given column. Because the data is physically arranged in that order, the rows can be extracted very efficiently.

Something like a GUID, while excellent for a primary key, could be positively detrimental to performance, as there will be additional cost for inserts and no perceptible benefit on selects.

So yes, don't cluster an index on GUID.

As to why it's not offered as a recommendation, I'd suggest the tuner is aware of this fact.

Solution 2

You almost certainly want to establish a clustered index on every table in your database. If a table does not have a clustered index it is what is referred to as a "Heap" and performance of most types of common queries is less for a heap than for a clustered index table.

Which fields the clustered index should be established on depend on the table itself, and the expected usage patterns of queries against the table. In almost every case you probably want the clustered index to be on a column or a combination of columns that is unique, i.e., (an alternate key), because if it isn't, SQL will add a unique value to the end of whatever fields you select anyway. If your table has a column or columns in it that will be frequently used by queries to select or filter multiple records, (for example if your table contains sales transactions, and your application will frequently request sales transactions by product Id, or even better, a Invoice details table, where in almost every case you will be retrieving all the detail records for a specific invoice, or an invoice table where you often retrieve all the invoices for a particular customer... This is true whether you will be selected large numbers of records by a single value, or by a range of values)

These columns are candidates for the clustered index. The order of the columns in the clustered index is critical.. The first column defined in the index should be the column that will be selected or filtered on first in expected queries.

The reason for all this is based on understanding the internal structure of a database index. These indices are called balanced-tree (B-Tree) indices. they are kinda like a binary tree, except that each node in the tree can have an arbitrary number of entries, (and child nodes), instead of just two. What makes a clustered index different is that the leaf nodes in a clustered index are the actual physical disk data pages of the table itself. whereas the leaf nodes of the non-clustered index just "point" to the tables' data pages.

When a table has a clustered index, therefore, the tables data pages are the leaf level of that index, and each one has a pointer to the previous page and the next page in the index order (they form a doubly-linked-list).

So if your query requests a range of rows that is in the same order as the clustered index... the processor only has to traverse the index once (or maybe twice), to find the start page of the data, and then follow the linked list pointers to get to the next page and the next page, until it has read all the data pages it needs.

For a non-clustered index, it has to traverse the index once for every row it retrieves...

NOTE: EDIT
To address the sequential issue for Guid Key columns, be aware that SQL2k5 has NEWSEQUENTIALID() that does in fact generate Guids the "old" sequential way.

or you can investigate Jimmy Nielsens COMB guid algotithm that is implemented in client side code:

COMB Guids

Solution 3

The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.

However, with integer-based clustered indexes, the integers are normally sequential (like with an IDENTITY spec), so they just get added to the end an no data needs to be moved around.

On the other hand, clustered indexes are not always bad on GUIDs... it all depends upon the needs of your application. If you need to be able to SELECT records quickly, then use a clustered index... the INSERT speed will suffer, but the SELECT speed will be improved.

Solution 4

While clustering on a GUID is normally a bad idea, be aware that GUIDs can under some circumstances cause fragmentation even in non-clustered indexes.

Note that if you're using SQL Server 2005, the newsequentialid() function produces sequential GUIDs. This helps to prevent the fragmentation problem.

I suggest using a SQL query like the following to measure fragmentation before making any decisions (excuse the non-ANSI syntax):

SELECT OBJECT_NAME (ips.[object_id]) AS 'Object Name',
       si.name AS 'Index Name',
       ROUND (ips.avg_fragmentation_in_percent, 2) AS 'Fragmentation',
       ips.page_count AS 'Pages',
       ROUND (ips.avg_page_space_used_in_percent, 2) AS 'Page Density'
FROM sys.dm_db_index_physical_stats 
     (DB_ID ('MyDatabase'), NULL, NULL, NULL, 'DETAILED') ips
CROSS APPLY sys.indexes si
WHERE si.object_id = ips.object_id
AND   si.index_id = ips.index_id
AND   ips.index_level = 0;

Solution 5

If you are using NewId(), you could switch to NewSequentialId(). That should help the insert perf.

Share:
23,213
cbp
Author by

cbp

Updated on July 05, 2020

Comments

  • cbp
    cbp almost 4 years

    I am working on a database that usually uses GUIDs as primary keys.

    By default SQL Server places a clustered index on primary key columns. I understand that this is a silly idea for GUID columns, and that non-clustered indexes are better.

    What do you think - should I get rid of all the clustered indexes and replace them with non-clustered indexes?

    Why wouldn't SQL's performance tuner offer this as a recommendation?

  • Mike Woodhouse
    Mike Woodhouse over 15 years
    Clustering doesn't affect lookup speed - a unique non-clustered index should do the job.
  • HTTP 410
    HTTP 410 over 15 years
    With SQL 2005 and newsequentialid(), the fragmentation problem goes away to a large extent. It's best to measure by looking at sys.dm_db_index_physical_stats and sys_indexes.
  • cbp
    cbp over 15 years
    But what about GUIDs? Unless they are sequential GUIDs you won't ever be retrieving a range of rows in the same order as the clustered index. Thus my question
  • Charles Bretana
    Charles Bretana over 15 years
    Well you're right, in general, a non=clustered index will be slightly faster than a clustered index for single row access when non-index columns must be fetched. For "covering" indices, otoh, it shouldn;t matter. (con't)
  • Charles Bretana
    Charles Bretana over 15 years
    But a clustered index can help in queries for "groups" of data, even if you are using non-sequential Guids. If the guid is PK in a parent table, for example, and the first (FK) column of a composite clusterd index PK in a child table, then all the clustered index benefits apply.
  • Charles Bretana
    Charles Bretana over 15 years
    Also, you "Can" create sequential Guids... See yafla.com/dennisforbes/Sequential-GUIDs-in-SQL-Server/…
  • Jonathan Gilbert
    Jonathan Gilbert almost 8 years
    You still don't get any benefit in your queries, though. You should only cluster on UNIQUEIDENTIFIER if you need to, e.g. for replication.