Optimizing SQL queries by removing Sort operator in Execution plan

65,239

First, you should verify that the sort is actually a performance bottleneck. The duration of the sort will depend on the number of elements to be sorted, and the number of stores for a particular parent store is likely to be small. (That is assuming the sort operator is applied after applying the where clause).

I’ve heard that a Sort operator indicates a bad design in the query since the sort can be made prematurely through an index

That's an over-generalization. Often, a sort-operator can trivially be moved into the index, and, if only the first couple rows of the result set are fetched, can substantially reduce query cost, because the database no longer has to fetch all matching rows (and sort them all) to find the first ones, but can read the records in result set order, and stop once enough records are found.

In your case, you seem to be fetching the entire result set, so sorting that is unlikely to make things much worse (unless the result set is huge). Also, in your case it might not be trivial to build a useful sorted index, because the where clause contains an or.

Now, if you still want to get rid of that sort-operator, you can try:

SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] in (0, 1)
ORDER BY [Phone]    

Alternatively, you can try the following index:

CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Phone], [Type])

to try getting the query optimizer to do an index range scan on ParentStoreId only, then scan all matching rows in the index, outputting them if Type matches. However, this is likely to cause more disk I/O, and hence slow your query down rather than speed it up.

Edit: As a last resort, you could use

SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] = 0
ORDER BY [Phone]

UNION ALL

SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] = 1
ORDER BY [Phone]

with

CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Type], [Phone])

and sort the two lists on the application server, where you can merge (as in merge sort) the presorted lists, thereby avoiding a complete sort. But that's really a micro-optimization that, while speeding up the sort itself by an order of magnitude, is unlikely to affect the total execution time of the query much, as I'd expect the bottleneck to be network and disk I/O, especially in light of the fact that the disk will do a lot of random access as the index is not clustered.

Share:
65,239

Related videos on Youtube

jodev
Author by

jodev

Updated on July 09, 2022

Comments

  • jodev
    jodev almost 2 years

    I’ve just started looking into optimizing my queries through indexes because SQL data is growing large and fast. I looked at how the optimizer is processing my query through the Execution plan in SSMS and noticed that a Sort operator is being used. I’ve heard that a Sort operator indicates a bad design in the query since the sort can be made prematurely through an index. So here is an example table and data similar to what I’m doing:

    IF OBJECT_ID('dbo.Store') IS NOT NULL DROP TABLE dbo.[Store]
    GO
    
    CREATE TABLE dbo.[Store]
    (
        [StoreId] int NOT NULL IDENTITY (1, 1),
        [ParentStoreId] int NULL,
        [Type] int NULL,
        [Phone] char(10) NULL,
        PRIMARY KEY ([StoreId])
    )
    
    INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 0, '2223334444')
    INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 0, '3334445555')
    INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 1, '0001112222')
    INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 1, '1112223333')
    GO
    

    Here is an example query:

    SELECT [Phone]
    FROM [dbo].[Store]
    WHERE [ParentStoreId] = 10
    AND ([Type] = 0 OR [Type] = 1)
    ORDER BY [Phone]
    

    I create a non clustered index to help speed up the query:

    CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Type], [Phone])
    

    To build the IX_Store index, I start with the simple predicates

    [ParentStoreId] = 10
    AND ([Type] = 0 OR [Type] = 1)
    

    Then I add the [Phone] column for the ORDER BY and to cover the SELECT output

    So even when the index is built, the optimizer still uses the Sort operator (and not the index sort) because [Phone] is sorted AFTER [ParentStoreId] AND [Type]. If I remove the [Type] column from the index and run the query:

    SELECT [Phone]
    FROM [dbo].[Store]
    WHERE [ParentStoreId] = 10
    --AND ([Type] = 0 OR [Type] = 1)
    ORDER BY [Phone]
    

    Then of course the Sort operator is not used by the optimizer because [Phone] is sorted by [ParentStoreId].

    So the question is how can I create an index that will cover the query (including the [Type] predicate) and not have the optimizer use a Sort?

    EDIT:

    The table I'm working with has more than 20 million rows

    • Lucero
      Lucero almost 13 years
      You really should make [StoreId] a primary key (which also defaults to clustered by the way), not just add a unique index.
    • Andrew Savinykh
      Andrew Savinykh almost 13 years
      You might be able to work around this by creating a second index on the Phone column.
    • jodev
      jodev almost 13 years
      @Lucero, I modified my post to mark [StoreId] as the primary key although I don't think this will solve the Sort problem
    • jodev
      jodev almost 13 years
      @zespri, I just noted that the table I'm working with is huge. Creating a new index will eat up a lot of hard drive space
    • Lucero
      Lucero almost 13 years
      @jodev, my comment was not an answer to your question, but a more general design advice. Haveing a clustered, small, continuous primary key can help with the overall performance of the system, which is why this is a good practice to start with.
    • Martin Smith
      Martin Smith almost 13 years
      @Lucero - From the Query Optimiser's point of view there is no difference between a clustered PK and a clustered unique index on non nullable columns as the OP originally had it. The latter is more flexible as well there are some restrictions on indexes that backup Primary Keys that do not apply to unique indexes.
    • Lucero
      Lucero almost 13 years
      @Martin, the point is that not using a PK here would also mean that you cannot have a FK on the ParentStoreID - so while a PK is indeed the same to the query optimizer as a clustered UK, this should still be a PK in order to leverage the data integrity benefits with a FK.
    • Martin Smith
      Martin Smith almost 13 years
      @Lucero - That point is false. You can create an FK referencing a unique index. There is no difference in either entity integrity or referential integrity.
  • jodev
    jodev almost 13 years
    the table i'm using has more than 20 million rows. There are about 50 different "[ParentStoreId]" values and 8 different "[Type]" values. In the end I'm left to sort about 200K rows which seems to be slowing down the query. Your info is useful though, I will give it a try
  • user4205580
    user4205580 over 4 years
    @meriton Could you explain why using "type in (0,1)" would make a difference? How is that different from an "or"? Why would the phone numbers be automatically sorted without the sort operator? To my understanding if the index is created on ParentStoreId, Type, Phone, then phone numbers are sorted separately for each type?