Should every User Table have a Clustered Index?

17,683

Solution 1

It's hard to state this more succinctly than SQL Server MVP Brad McGehee:

As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases–such as an identity column, or some other column where the value is increasing–and is unique. In many cases, the primary key is the ideal column for a clustered index.

BOL echoes this sentiment:

With few exceptions, every table should have a clustered index.

The reasons for doing this are many and are primarily based upon the fact that a clustered index physically orders your data in storage.

  • If your clustered index is on a single column monotonically increases, inserts occur in order on your storage device and page splits will not happen.

  • Clustered indexes are efficient for finding a specific row when the indexed value is unique, such as the common pattern of selecting a row based upon the primary key.

  • A clustered index often allows for efficient queries on columns that are often searched for ranges of values (between, >, etc.).

  • Clustering can speed up queries where data is commonly sorted by a specific column or columns.

  • A clustered index can be rebuilt or reorganized on demand to control table fragmentation.

  • These benefits can even be applied to views.

You may not want to have a clustered index on:

  • Columns that have frequent data changes, as SQL Server must then physically re-order the data in storage.

  • Columns that are already covered by other indexes.

  • Wide keys, as the clustered index is also used in non-clustered index lookups.

  • GUID columns, which are larger than identities and also effectively random values (not likely to be sorted upon), though newsequentialid() could be used to help mitigate physical reordering during inserts.

  • A rare reason to use a heap (table without a clustered index) is if the data is always accessed through nonclustered indexes and the RID (SQL Server internal row identifier) is known to be smaller than a clustered index key.

Because of these and other considerations, such as your particular application workloads, you should carefully select your clustered indexes to get maximum benefit for your queries.

Also note that when you create a primary key on a table in SQL Server, it will by default create a unique clustered index (if it doesn't already have one). This means that if you find a table that doesn't have a clustered index, but does have a primary key (as all tables should), a developer had previously made the decision to create it that way. You may want to have a compelling reason to change that (of which there are many, as we've seen). Adding, changing or dropping the clustered index requires rewriting the entire table and any non-clustered indexes, so this can take some time on a large table.

Solution 2

I would not say "Every table should have a clustered index", I would say "Look carefully at every table and how they are accessed and try to define a clustered index on it if it makes sense". It's a plus, like a Joker, you have only one Joker per table, but you don't have to use it. Other database systems don't have this, at least in this form, BTW.

Putting clustered indices everywhere without understanding what you're doing can also kill your performance (in general, the INSERT performance because a clustered index means physical re-ordering on the disk, or at least it's a good way to understand it), for example with GUID primary keys as we see more and more.

So, read Tim Lehner's exceptions and reason.

Solution 3

Yes you should have clustered index on a table.So that all nonclustered indexes perform in better way.

Solution 4

Performance is a big hairy problem. Make sure you are optimizing for the right thing.

Free advice is always worth it's price, and there is no substitute for actual experimentation.

The purpose of an index is to find matching rows and help retrieve the data when found.

A non-clustered index on your search criteria will help to find rows, but there needs to be additional operation to get at the row's data.

If there is no clustered index, SQL uses an internal rowId to point to the location of the data.

However, If there is a clustered index on the table, that rowId is replaced by the data values in the clustered index.

So the step of reading the rows data would not be needed, and would be covered by the values in the index.

Even if a clustered index isn't very good at being selective, if those keys are frequently most or all of the results requested - it may be helpful to have them as the leaf of the non-clustered index.

Share:
17,683

Related videos on Youtube

Sreedhar
Author by

Sreedhar

:-)

Updated on June 20, 2022

Comments

  • Sreedhar
    Sreedhar almost 2 years

    Recently I found a couple of tables in a Database with no Clustered Indexes defined. But there are non-clustered indexes defined, so they are on HEAP.

    On analysis I found that select statements were using filter on the columns defined in non-clustered indexes.

    Not having a clustered index on these tables affect performance?

  • Admin
    Admin almost 12 years
    can you explain the third bullet A clustered index allows for efficient queries on columns that are often searched for ranges of values (between, >, etc.). ?
  • Tim Lehner
    Tim Lehner almost 12 years
    A clustered index works well in this manner because the next higher or lower keyed rows are guaranteed to be physically adjacent to each other in storage. Thus, once the first value is found, it is unnecessary to search for the remaining rows; you've already found them.
  • Mike Perrenoud
    Mike Perrenoud almost 12 years
    @TimLehner - very good post ... +1 for that ... I learned something today!
  • Martin Smith
    Martin Smith over 11 years
    They are not guaranteed to be physically adjacent in storage. They are only guaranteed to be logically adjacent. This is why you need to reorganize or rebuild indexes.
  • Tim Lehner
    Tim Lehner over 11 years
    @MartinSmith, good call, you're right. No index is immune to fragmentation. Technet could use an update on this sentence: "[...] rows with subsequent indexed values are guaranteed to be physically adjacent."
  • cbp
    cbp almost 11 years
    You mention that heaps are rare, however I don't think it is that uncommon if you are using GUID identifiers. Other than insert dates and sequential IDs, there are usually not many other candidates for columns that fulfill the requirements of being monotonically increasing, infrequently changing and reasonably frequently accessed.
  • Richard Watson
    Richard Watson over 9 years
    Interesting article pushing heaps here: use-the-index-luke.com/blog/2014-01/…
  • Flemin Adambukulam
    Flemin Adambukulam almost 7 years
    use-the-index-luke.com/blog/2014-01/… This article strongly contradicts your statement
  • Konstantin
    Konstantin over 6 years
    How big is the penalty you get for having a cluster index on a column whose values are not monotonically increasing? What if all your columns in the table are like that? Should you still have a clustered index?