Difference between clustered and nonclustered index

281,673

Solution 1

You really need to keep two issues apart:

1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way!

One rule of thumb I would apply is this: any "regular" table (one that you use to store data in, that is a lookup table etc.) should have a clustering key. There's really no point not to have a clustering key. Actually, contrary to common believe, having a clustering key actually speeds up all the common operations - even inserts and deletes (since the table organization is different and usually better than with a heap - a table without a clustering key).

Kimberly Tripp, the Queen of Indexing has a great many excellent articles on the topic of why to have a clustering key, and what kind of columns to best use as your clustering key. Since you only get one per table, it's of utmost importance to pick the right clustering key - and not just any clustering key.

Marc

Solution 2

A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL server sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book.

A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference.

Solution 3

You should be using indexes to help SQL server performance. Usually that implies that columns that are used to find rows in a table are indexed.

Clustered indexes makes SQL server order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.

Having many indexes is not good either. They cost to maintain. So start out with the obvious ones, and then profile to see which ones you miss and would benefit from. You do not need them from start, they can be added later on.

Most column datatypes can be used when indexing, but it is better to have small columns indexed than large. Also it is common to create indexes on groups of columns (e.g. country + city + street).

Also you will not notice performance issues until you have quite a bit of data in your tables. And another thing to think about is that SQL server needs statistics to do its query optimizations the right way, so make sure that you do generate that.

Solution 4

A comparison of a non-clustered index with a clustered index with an example

As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the

EmployeeID

AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like

EmployeeName, EmployeeAddress, etc

. will all actually be stored in the leaf node of the clustered index itself.

This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.

Solution 5

In general, use an index on a column that's going to be used (a lot) to search the table, such as a primary key (which by default has a clustered index). For example, if you have the query (in pseudocode)

SELECT * FROM FOO WHERE FOO.BAR = 2

You might want to put an index on FOO.BAR. A clustered index should be used on a column that will be used for sorting. A clustered index is used to sort the rows on disk, so you can only have one per table. For example if you have the query

SELECT * FROM FOO ORDER BY FOO.BAR ASCENDING

You might want to consider a clustered index on FOO.BAR.

Probably the most important consideration is how much time your queries are taking. If a query doesn't take much time or isn't used very often, it may not be worth adding indexes. As always, profile first, then optimize. SQL Server Studio can give you suggestions on where to optimize, and MSDN has some information1 that you might find useful

Share:
281,673
Pabuc
Author by

Pabuc

It is all about test

Updated on March 23, 2020

Comments

  • Pabuc
    Pabuc about 4 years

    I need to add proper index to my tables and need some help.

    I'm confused and need to clarify a few points:

    • Should I use index for non-int columns? Why/why not

    • I've read a lot about clustered and non-clustered index yet I still can't decide when to use one over the other. A good example would help me and a lot of other developers.

    I know that I shouldn't use indexes for columns or tables that are often updated. What else should I be careful about and how can I know that it is all good before going to test phase?

  • richard
    richard about 13 years
    2 more points . . .By default, when you add a primary key to a table, it is a clustered index. And second, you can only have 1 clustered index.
  • Pabuc
    Pabuc about 13 years
    I'm confused, isn't pk a clustured index? and I know that I can have only 1 in table. That doesn't make sense with your answer.
  • marc_s
    marc_s about 13 years
    @pabuc: the PK is the clustered index by default - but it doesn't have to be. You can have the PK on one column, non-clustered, and another column makes up the clustered index
  • Brandon Bohrer
    Brandon Bohrer about 13 years
    @marc_s Thanks for clearing that up. I've updated the answer to reflect this.
  • nvogel
    nvogel about 11 years
    A decent explanation except for the last sentence. Reading from a clustered index is rarely faster than reading from a nonclustered one simply because the CL one is usually much larger and never smaller than any NC index on the same table. That's why the query optimizer will normally choose an NC index whenever it can in preference to a CL one - especially for "covered" queries where no bookmark lookup is required.
  • Dhwani
    Dhwani almost 11 years
    What about Identity column that is not marked as Primary Key? Is it clustered?
  • marc_s
    marc_s almost 11 years
    @ProgrammerIT: no - an IDENTITY column is only that - a system-generated value. Nothing else.
  • Krunal
    Krunal over 7 years
    Does that mean when we have use case where data updates frequently, the SQL needs to update Index every time data is updated? And does it slow down the SQL when performing update? (of course I'm talking about huge data set)