Why is doing a top(1) on an indexed column in SQL Server slow?

sql-server tsql query-optimization performance

13,762

Solution 1

Due to the statistics, you should explicitly ask the optimizer to use the index you've created instead of the clustered one.

SELECT  TOP (1) connectionid
FROM    outgoing_messages WITH (NOLOCK, index(idx_connectionid))
WHERE  (campaignid_int = 3835)

I hope it will solve the issue.

Regards, Enrique

Solution 2

I recently had the same issue and it's really quite simple to solve (at least in some cases).

If you add an ORDER BY-clause on any or some of the columns that's indexed it should be solved. That solved it for me at least.

Solution 3

You aren't specifying an ORDER BY clause in your query, so the optimiser is not being instructed as to the sort order it should be selecting the top 1 from. SQL Server won't just take a random row, it will order the rows by something and take the top 1, and it may be choosing to order by something that is sub-optimal. I would suggest that you add an ORDER BY x clause, where x being the clustered key on that table will probably be the fastest.

This may not solve your problem -- in fact I'm not sure I expect it to from the statistics you've given -- but (a) it won't hurt, and (b) you'll be able to rule this out as a contributing factor.

Solution 4

This doesn't answer your question, but try using:

SET ROWCOUNT 1
SELECT     connectionid
 FROM         outgoing_messages WITH (NOLOCK)
 WHERE     (campaignid_int = 3835)

I've seen top(x) perform very badly in certain situations as well. I'm sure it's doing a full table scan. Perhaps your index on that particular column needs to be rebuilt? The above is worth a try, however.

Solution 5

The index may be useless for 2 reasons:

700k in 10 million may be not selective enough
and /or
connectionid needs included so the entire query can used only an index

Otherwise, the optimiser decides it may as well use the PK/clustered index to both filter on campaignid_int and get connectionid, to avoid a bookmark lookup on 700k rows from the current index.

So, I suggest this...

CREATE NONCLUSTERED INDEX IX_Foo ON MyTable (campaignid_int) INCLUDE (connectionid)

View more solutions

13,762

Author by

Toad

Updated on June 24, 2022

Comments

Toad about 2 years
I'm puzzled by the following. I have a DB with around 10 million rows, and (among other indices) on 1 column (campaignid_int) is an index.

Now I have 700k rows where the campaignid is indeed 3835

For all these rows, the connectionid is the same.

I just want to find out this connectionid.
```
 use messaging_db;
 SELECT     TOP (1) connectionid
 FROM         outgoing_messages WITH (NOLOCK)
 WHERE     (campaignid_int = 3835)
```
Now this query takes approx 30 seconds to perform!

I (with my small db knowledge) would expect that it would take any of the rows, and return me that connectionid

If I test this same query for a campaign which only has 1 entry, it goes really fast. So the index works.

How would I tackle this and why does this not work?

edit:
```
estimated execution plan:

select (0%) - top (0%) - clustered index scan (100%)
```
Toad over 14 years

I changed the question slightly. The campaignid_int is indexed
Toad over 14 years

I actually tried order by recid (the primary key) and it was just as slow. =^(
Toad over 14 years

there is no way to trick it to just return any of the rows? I would guess that even though there are a lot of rows for this index, that it could still index to any of these rows. I just don't want it to traverse the full DB
Toad over 14 years

but since I'm specifying 'top(1)' it means: give me any row. Why would it first crawl through the 700k rows just to return one?
Adrian Pronk over 14 years

I know nothing about MS-SQL-server but will "ORDER BY campaignid_int" meet the optimiser's requirement for ordering?
ScottE over 14 years

Bummer. I don't recall how we solved our query issue - if we adjusted the indices, or just took the top result in code instead of at the db level.
Håvard S over 14 years

Sorry, no. The main purpose of the index is fast lookup on unique values (i.e. keys), but when you have many duplicate values, an index seek (i.e. lookup) won't do, so the optimizer will issue an index scan. You will need to index differently or change your query.
Toad over 14 years

@adrian: but to order it, wouldn't the db need to scan all entries to know which one comes at the top? I already know all value are the same, so it can stop at any row it finds.
Adrian Pronk almost 14 years

@Toad: No, it shouldn't need to scan all rows. Since the WHERE clause stipulates that campaignid_int = 3835 the optimiser could know that the ORDER BY will be honoured without requiring the rows be scanned.
Iwade Rock about 11 years

TOP is applied after the FROM, WHERE, GROUP BY and ORDER BY phases of the SELECT statement. Therefore, you incur the cost of those operations before the database engine processes TOP.
Brain2000 over 9 years

Bizarre! In my case, without the ORDER BY took 50 cpu and 2200 reads. With the ORDER BY took 0 cpu and 5 reads. I just wanted one row from a table with one compound primary key.
paul-2011 about 4 years

A composite index consisting of the guid + date worked like a charm.