Non-clustered index and clustered index on the same column

sql-server indexing b-tree clustered-index non-clustered-index

12,671

Solution 1

Assuming you're talking about SQL Server and also assuming that you have a clustered index on your table (as you should).

Then the nonclustered index has the columns that you define in your CREATE INDEX statement, plus it has the column(s) that make up your clustered index (if present).

That clustering key value is the "pointer" to where the actual data is located.

If the query executor seeks through your nonclustered index for a value and find a match, then

either that value is all you care about - then you just get back that value
or the nonclustered index might also have some included columns (in the leaf level page) and with those, the query can be satisfied (all columns requested are present) so you get back the values you asked for
or then the values you want are not all in the nonclustered index leaf-level page (that's especially true if you do SELECT * all the time) and then the query executor has to take the clustering key value from the nonclustered index, and go back to the clustering index, do what's called a key lookup, seek through the clustering index, and find the associated data page where the complete row is stored -> and now the query executor can return the values you've asked for

For a pretty good explanation - see this blog post here. It says:

In NonClustered Index:
....
2.b. If the table has a Clustered index, or the index is on an Indexed view, the row locator is the Clustered index Key for the row. SQL Server retrieves the data row by searching the Clustered index using the Clustered index Key stored in the Leaf row of the NonClustered index.

Or see this blog post in a whole series on SQL Server indexes which explains the "bookmarks" stored in a nonclustered index leaf-level page, too.

Solution 2

It's pretty easy to imagine like it this:

You have a table of customers, for example customer(id, name, age, adress). On this table you have a clustered index on age. This means your data is sorted by age on the hard drive. This is very beneficial for when you want to do range queries like:

SELECT * FROM customer WHERE age > 18;

Then the data can be fetched from your hard drive with only a few sequential reads. If the index were unclustered you would have to make one disc access (included the seek of the data) for every matching customer tuple.

Maybe for your application you also need to access the users by id. This means without an additional index on id you would have to run over the entire file to find an particular id because it's sorted by age and you have no index! To avoid that, you create a second index on id. Now you can search for an id in this index and the leaf of the index, which contains the customer you are looking for, points to the place in your (by age clustered) data on disc, where you find the tuple. By this you must not read the whole table need much fewer disc accesses (in general 2 for index lookup + 1 for fetching the tuple).

EDIT: I didn't see that you were talking about the same column. One thing I could imagine is that you do one clustered index on one column for the reason described above and another combined index of this and another column for example. This can be useful to do an index-only lookup, here you have all the required attributes in the index and don't need to do a page fetch at all. Another reason would be to have a clustered B+-Index for range queries and a Hash-Index for equality queries. But I think the benefit here would be negligible.

Hope this helped!

12,671

SexyBeast

Stackoverflow is. Therefore all programmers are.

Updated on September 26, 2022

Comments

SexyBeast over 1 year

I came across this post in Stackoverflow. The first answer mentions something like A clustered index has all the data for the table while a non clustered index only has the column + the location of the clustered index or the row if it is on a heap (a table without a clustered index). How can a non-clustered index have the location of the clustered index? It only contains the column values sorted as nodes in a B-treee with each node pinting to the row where the column has that node-value, right?
SexyBeast over 11 years

I don't understand, the non-clustered index itself holds a pointer to the complete row of data, why need a clustered index for that again?
marc_s over 11 years

@Cupidvogel: the nonclustered index's leaf-level page has its own index columns, any included columns, and the value of the clustering key - but that's NOT the actual data page location. Once you've found the NC leaf-level page and you need the full data page, a key lookup (index seek in the clustering index) is needed to get to the actual data page
SexyBeast over 11 years

Then how is the lookup done when there is only a non-clustered index and no clustered index? And what is meant by included columns?
marc_s over 11 years

@Cupidvogel: if you don't have a clustered index - then yes, in that case of a heap table, you have a row identifier in the nonclustered index's leaf-level page - direct pointer to the data page - which also needs to be updated every time the data page might move - which is a pain and hurts performance quite a bit. Heaps are a mess though - avoid them if ever possible!
SexyBeast over 11 years

What is the point of adding another level of indirection in the form of the clustered index to the leaf level when the leaf level can point to the data directly?
marc_s over 11 years

@Cupidvogel: you don't have to constantly update the clustered index key values - while you might have to do this for the "direct" page identifier, if e.g. a page split occurs (and that operation can be tremedously expensive!). Imagine you have a table with 10, 15 nonclustered indices - and now a page split occurs. ALL nonclustered indices pointing to that page would now have to be updated - horrendous overhead!
SexyBeast over 11 years

But won't page split affect clustered indexes also? They maintain the sorted order through doubly linked lists in their index pages, so those links will also have to update if a page split occurs.
marc_s over 11 years

@Cupidvogel: yes, but a page split doesn't change the clustering key value!