Paging, sorting and filtering in a stored procedure (SQL Server)

17,211

Solution 1

I would most likely split the @SortBy argument into two, @SortColumn and @SortDirection, and use them like this:

…
ROW_NUMBER() OVER (
  ORDER BY CASE @SortColumn
    WHEN 'Name'      THEN Name
    WHEN 'OtherName' THEN OtherName
    …
  END *
  CASE @SortDirection
    WHEN 'DESC' THEN -1
    ELSE 1
  END
) AS Row
…

And this is how the TotalRows column could be defined (in the main select):

…
COUNT(*) OVER () AS TotalRows
…

Solution 2

I would definitely want to do a combination of a temp table and NTILE for this sort of approach.

The temp table will allow you to do your complicated series of conditions just once. Because you're only storing the pieces you care about, it also means that when you start doing selects against it further in the procedure, it should have a smaller overall memory usage than if you ran the condition multiple times.

I like NTILE() for this better than ROW_NUMBER() because it's doing the work you're trying to accomplish for you, rather than having additional where conditions to worry about.

The example below is one based off a similar query I'm using as part of a research query; I have an ID I can use that I know will be unique in the results. Using an ID that was an identity column would also be appropriate here, though.

--DECLARES here would be stored procedure parameters
declare @pagesize int, @sortby varchar(25), @page int = 1;

--Create temp with all relevant columns; ID here could be an identity PK to help with paging query below
create table #temp (id int not null primary key clustered, status varchar(50), lastname varchar(100), startdate datetime);

--Insert into #temp based off of your complex conditions, but with no attempt at paging
insert into #temp
(id, status, lastname, startdate)
select id, status, lastname, startdate
from Table1 ...etc.
where ...complicated conditions


SET @pagesize = 50;
SET @page = 5;--OR CAST(@startRowIndex/@pagesize as int)+1
SET @sortby = 'name';

--Only use the id and count to use NTILE
;with paging(id, pagenum, totalrows) as 
(
    select id,
    NTILE((SELECT COUNT(*) cnt FROM #temp)/@pagesize) OVER(ORDER BY CASE WHEN @sortby = 'NAME' THEN lastname ELSE convert(varchar(10), startdate, 112) END),
    cnt
    FROM #temp
    cross apply (SELECT COUNT(*) cnt FROM #temp) total
)
--Use the id to join back to main select
SELECT *
FROM paging
JOIN #temp ON paging.id = #temp.id
WHERE paging.pagenum = @page

--Don't need the drop in the procedure, included here for rerunnability
drop table #temp;

I generally prefer temp tables over table variables in this scenario, largely so that there are definite statistics on the result set you have. (Search for temp table vs table variable and you'll find plenty of examples as to why)

Dynamic SQL would be most useful for handling the sorting method. Using my example, you could do the main query in dynamic SQL and only pull the sort method you want to pull into the OVER().

The example above also does the total in each row of the return set, which as you mentioned was not ideal. You could, instead, have a @totalrows output variable in your procedure and pull it as well as the result set. That would save you the CROSS APPLY that I'm doing above in the paging CTE.

Share:
17,211
Fruitbat
Author by

Fruitbat

Updated on June 14, 2022

Comments

  • Fruitbat
    Fruitbat about 2 years

    I was looking at different ways of writing a stored procedure to return a "page" of data. This was for use with the ASP ObjectDataSource, but it could be considered a more general problem.

    The requirement is to return a subset of the data based on the usual paging parameters; startPageIndex and maximumRows, but also a sortBy parameter to allow the data to be sorted. Also there are some parameters passed in to filter the data on various conditions.

    One common way to do this seems to be something like this:

    [Method 1]

    ;WITH stuff AS (
        SELECT 
            CASE 
                WHEN @SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
                WHEN @SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
                WHEN @SortBy = ... 
                ELSE ROW_NUMBER() OVER (ORDER BY whatever)
            END AS Row,
            ., 
            ., 
            .,
        FROM Table1
        INNER JOIN Table2 ...
        LEFT JOIN Table3 ...
        WHERE ... (lots of things to check)
        ) 
    SELECT *
    FROM stuff 
    WHERE (Row > @startRowIndex)
    AND   (Row <= @startRowIndex + @maximumRows OR @maximumRows <= 0)
    ORDER BY Row
    

    One problem with this is that it doesn't give the total count and generally we need another stored procedure for that. This second stored procedure has to replicate the parameter list and the complex WHERE clause. Not nice.

    One solution is to append an extra column to the final select list, (SELECT COUNT(*) FROM stuff) AS TotalRows. This gives us the total but repeats it for every row in the result set, which is not ideal.

    [Method 2]
    An interesting alternative is given here (http://www.4guysfromrolla.com/articles/032206-1.aspx) using dynamic SQL. He reckons that the performance is better because the CASE statement in the first solution drags things down. Fair enough, and this solution makes it easy to get the totalRows and slap it into an output parameter. But I hate coding dynamic SQL. All that 'bit of SQL ' + STR(@parm1) +' bit more SQL' gubbins.

    [Method 3]
    The only way I can find to get what I want, without repeating code which would have to be synchronized, and keeping things reasonably readable is to go back to the "old way" of using a table variable:

    DECLARE @stuff TABLE (Row INT, ...)
    
    INSERT INTO @stuff
    SELECT 
        CASE 
            WHEN @SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
            WHEN @SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
            WHEN @SortBy = ... 
            ELSE ROW_NUMBER() OVER (ORDER BY whatever)
        END AS Row,
        ., 
        ., 
        .,
    FROM Table1
    INNER JOIN Table2 ...
    LEFT JOIN Table3 ...
    WHERE ... (lots of things to check)
    
    SELECT *
    FROM stuff 
    WHERE (Row > @startRowIndex)
    AND   (Row <= @startRowIndex + @maximumRows OR @maximumRows <= 0)
    ORDER BY Row
    

    (Or a similar method using an IDENTITY column on the table variable). Here I can just add a SELECT COUNT on the table variable to get the totalRows and put it into an output parameter.

    I did some tests and with a fairly simple version of the query (no sortBy and no filter), method 1 seems to come up on top (almost twice as quick as the other 2). Then I decided to test probably I needed the complexity and I needed the SQL to be in stored procedures. With this I get method 1 taking nearly twice as long as the other 2 methods. Which seems strange.

    Is there any good reason why I shouldn't spurn CTEs and stick with method 3?


    UPDATE - 15 March 2012

    I tried adapting Method 1 to dump the page from the CTE into a temporary table so that I could extract the TotalRows and then select just the relevant columns for the resultset. This seemed to add significantly to the time (more than I expected). I should add that I'm running this on a laptop with SQL Server Express 2008 (all that I have available) but still the comparison should be valid.

    I looked again at the dynamic SQL method. It turns out I wasn't really doing it properly (just concatenating strings together). I set it up as in the documentation for sp_executesql (with a parameter description string and parameter list) and it's much more readable. Also this method runs fastest in my environment. Why that should be still baffles me, but I guess the answer is hinted at in Hogan's comment.

  • Andriy M
    Andriy M over 12 years
    marital, martial... à la mariage comme à la guerre :)
  • Fruitbat
    Fruitbat over 12 years
    Apart from splitting the SortBy, isn't this just my Method 1?
  • Fruitbat
    Fruitbat over 12 years
    Unless I've misunderstood, this seems to miss the point by some distance. The complexity of the WHERE clause would have to be replicated. There would have to be a staging table for each possible sort and filter. Unless you create the staging table each time, you won't show the latest changes to the database.
  • Andriy M
    Andriy M over 12 years
    @Fruitbat: I also use a single ROW_NUMBER() and the CASE expression is moved to the ROW_NUMBER()'s ORDER BY clause. You may still consider this your Method 1 modified, it's fine by me. :) I was merely suggesting what I would do, and I considered the changes worth posting as an answer.
  • John Dewey
    John Dewey over 12 years
    @Fruitbat - The staging table is created once for each report run, where it is sorted and paginated. A separate sproc is then run to deliver a page at a time. Separating the operations prevents the complex WHERE clause from being repeated for each page, which is what would happen if all steps were combined into one sproc. Only one sort is selected at a time, and because the two procedures would be run back-to-back for each report, the latest updates to the DB would always be included.
  • Fruitbat
    Fruitbat over 12 years
    Sorry, I don't get it. E.g. I search for names containing 'abc', sorted by name. As I page through the results, a new record is added to the database which matches my criteria. How can I pick this up without running sproc1 again? I'm not clear what you mean by "report run" in this context.
  • John Dewey
    John Dewey over 12 years
    The idea here is to execute dbo.EmployeesByMartialStatus once for the report run, and then EmployeesByMartialStatus_GetPage n times for each of n number of pages. EmployeesByMartialStatus_GetPage only reads the staging table.
  • John Dewey
    John Dewey over 12 years
    Usage example updated to show the first sproc being called once, and the second sproc being called multiple times.
  • Hogan
    Hogan over 12 years
    This does not solve the problem. An expected use case is two users requesting paging at the same time with different sort orders (and possibly different filters) -- having to build and rebuild the table(s) on each query would be way to slow and impossible to manage unless there were multiple results stored in the result table -- if you have that much space there are better ways to solve this problem with savvy index creation.
  • paparazzo
    paparazzo over 12 years
    @Fruitbat So you get that adding a record that matches the criteria will change the results from the SP but you don't get that it also changes that 51-100 may not follow the the prior 1-50.
  • RoLYroLLs
    RoLYroLLs over 6 years
    Love the -1 and 1, too bad something like this can't be used with non-numerics
  • RoLYroLLs
    RoLYroLLs over 6 years
    Would adding COUNT(*) OVER () AS TotalRows to the main selection not add more data to bandwidth than really needed? I was thinking of sending it back as an OUTPUT, but then I'd have to write the query twice, right? any implications on that?
  • Andriy M
    Andriy M over 6 years
    @RoLYroLLs: My suggestions are meant to be used with the OP's Method 1, which is just one SELECT statement providing all the data (in this case, including "metadata" like the row count). A SELECT statement doesn't have (or need) an OUTPUT clause, hence the suggestion to return the count as another column of the output. Yes, the same count will be repeated in all rows, but with Method 1 there's no other way to obtain this piece of information.