Best way to get result count before LIMIT was applied
Solution 1
Pure SQL
Things have changed since 2008. You can use a window function to get the full count and the limited result in one query. Introduced with PostgreSQL 8.4 in 2009.
SELECT foo
, count(*) OVER() AS full_count
FROM bar
WHERE <some condition>
ORDER BY <some col>
LIMIT <pagesize>
OFFSET <offset>;
Note that this can be considerably more expensive than without the total count. All rows have to be counted, and a possible shortcut taking just the top rows from a matching index may not be helpful any more.
Doesn't matter much with small tables or full_count
<= OFFSET
+ LIMIT
. Matters for a substantially bigger full_count
.
Corner case: when OFFSET
is at least as great as the number of rows from the base query, no row is returned. So you also get no full_count
. Possible alternative:
Sequence of events in a SELECT
query
( 0. CTEs are evaluated and materialized separately. In Postgres 12 or later the planner may inline those like subqueries before going to work.) Not here.
-
WHERE
clause (andJOIN
conditions, though none in your example) filter qualifying rows from the base table(s). The rest is based on the filtered subset.
( 2. GROUP BY
and aggregate functions would go here.) Not here.
( 3. Other SELECT
list expressions are evaluated, based on grouped / aggregated columns.) Not here.
-
Window functions are applied depending on the
OVER
clause and the frame specification of the function. The simplecount(*) OVER()
is based on all qualifying rows. -
ORDER BY
( 6. DISTINCT
or DISTINCT ON
would go here.) Not here.
-
LIMIT
/OFFSET
are applied based on the established order to select rows to return.
LIMIT
/ OFFSET
becomes increasingly inefficient with a growing number of rows in the table. Consider alternative approaches if you need better performance:
Alternatives to get final count
There are completely different approaches to get the count of affected rows (not the full count before OFFSET
& LIMIT
were applied). Postgres has internal bookkeeping how many rows where affected by the last SQL command. Some clients can access that information or count rows themselves (like psql).
For instance, you can retrieve the number of affected rows in plpgsql immediately after executing an SQL command with:
GET DIAGNOSTICS integer_var = ROW_COUNT;
Or you can use pg_num_rows
in PHP. Or similar functions in other clients.
Related:
Solution 2
As I describe on my blog, MySQL has a feature called SQL_CALC_FOUND_ROWS. This removes the need to do the query twice, but it still needs to do the query in its entireity, even if the limit clause would have allowed it to stop early.
As far as I know, there is no similar feature for PostgreSQL. One thing to watch out for when doing pagination (the most common thing for which LIMIT is used IMHO): doing an "OFFSET 1000 LIMIT 10" means that the DB has to fetch at least 1010 rows, even if it only gives you 10. A more performant way to do is to remember the value of the row you are ordering by for the previous row (the 1000th in this case) and rewrite the query like this: "... WHERE order_row > value_of_1000_th LIMIT 10". The advantage is that "order_row" is most probably indexed (if not, you've go a problem). The disadvantage being that if new elements are added between page views, this can get a little out of synch (but then again, it may not be observable by visitors and can be a big performance gain).
Solution 3
You could mitigate the performance penalty by not running the COUNT() query every time. Cache the number of pages for, say 5 minutes before the query is run again. Unless you're seeing a huge number of INSERTs, that should work just fine.
EvilPuppetMaster
Updated on July 05, 2022Comments
-
EvilPuppetMaster almost 2 years
When paging through data that comes from a DB, you need to know how many pages there will be to render the page jump controls.
Currently I do that by running the query twice, once wrapped in a
count()
to determine the total results, and a second time with a limit applied to get back just the results I need for the current page.This seems inefficient. Is there a better way to determine how many results would have been returned before
LIMIT
was applied?I am using PHP and Postgres.
-
Frank Farmer about 11 yearsIn older versions of postgres, you can do something similar with cursors. e.g.
BEGIN; DECLARE c CURSOR FOR SELECT * FROM table; MOVE FORWARD 100 IN c; FETCH 10 FROM c; MOVE FORWARD ALL IN c; COMMIT;
. You get your data back from the FETCH, and you can work out row counts frompg_affected_rows
calls as appropriate. -
ma11hew28 over 7 yearsThis makes an extra column
full_count
for the total count, repeating the same total count for each row in the limited result. That seems a little redundant & inefficient to me. I'm not sure if it's slower for PostgreSQL, but I think this will cause the database server to have to send more data to the application server, right? Would it be better (faster & more efficient) to use aWITH
query to get the total count in one row, and thenUNION
that with the limited result? -
Erwin Brandstetter over 7 years@mattdipasquale: A CTE is typically much slower (more overhead). Try it. You could get the row count with a plpgsql function with
GET DIAGNOSTICS
-
denoise about 6 yearsbut pg_num_rows is not supposed to return the number of results after the limit?
-
Erwin Brandstetter about 6 years@denoise: You are right, that wasn't quite clear. I clarified to avoid misunderstandings.
-
Greg0ry over 2 yearsThanks for taking your time to come back and adding edits to such old answers.