Extremely slow PostgreSQL query with ORDER and LIMIT clauses
Solution 1
When you have both the LIMIT and ORDER BY, the optimizer has decided it is faster to limp through the unfiltered records on foo by key descending until it gets five matches for the rest of the criteria. In the other cases, it simply runs the query as a nested loop and returns all the records.
Offhand, I'd say the problem is that PG doesn't grok the joint distribution of the various ids and that's why the plan is so sub-optimal.
For possible solutions: I'll assume that you have run ANALYZE recently. If not, do so. That may explain why your estimated times are high even on the version that returns fast. If the problem persists, perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query.
Solution 2
Probably it happens because before it tries to order then to select. Why do not try to sort the result in an outer select all? Something like: SELECT * FROM (SELECT ... INNER JOIN ETC...) ORDER BY ... DESC
Solution 3
Your query plan indicates a filter on
(((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))
which doesn't appear in the SELECT - where is it coming from?
Also, note that expression is listed as a "Filter" and not an "Index Cond" which would seem to indicate there's no index applied to it.
jakeboxer
Jake Boxer is a software engineer living in San Francisco, CA and working at GitHub. He graduated summa cum laude from UMass Amherst with a Bachelor’s degree in Computer Science. He works at GitHub developing Ruby on Rails and iPhone apps. He is 25 years old. He feels strange about writing in the third person, and is going to stop now.
Updated on July 09, 2022Comments
-
jakeboxer almost 2 years
I have a table, let's call it "foos", with almost 6 million records in it. I am running the following query:
SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0;
This query takes a very long time to run (Rails times out while running it). There is an index on all IDs in question. The curious part is, if I remove either the
ORDER BY
clause or theLIMIT
clause, it runs almost instantaneously.I'm assuming that the presence of both
ORDER BY
andLIMIT
are making PostgreSQL make some bad choices in query planning. Anyone have any ideas on how to fix this?In case it helps, here is the
EXPLAIN
for all 3 cases://////// Both ORDER and LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..16663.44 rows=5 width=663) -> Nested Loop (cost=0.00..25355084.05 rows=7608 width=663) Join Filter: (foos.bar_id = bars.id) -> Index Scan Backward using foos_pkey on foos (cost=0.00..11804133.33 rows=4963477 width=663) Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text)) -> Materialize (cost=0.00..658.96 rows=182 width=4) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) (8 rows) //////// Just LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) LIMIT 5 OFFSET 0; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..22.21 rows=5 width=663) -> Nested Loop (cost=0.00..33788.21 rows=7608 width=663) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) -> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663) Index Cond: (foos.bar_id = bars.id) Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (7 rows) //////// Just ORDER SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=36515.17..36534.19 rows=7608 width=663) Sort Key: foos.id -> Nested Loop (cost=0.00..33788.21 rows=7608 width=663) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) -> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663) Index Cond: (foos.bar_id = bars.id) Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (8 rows)
-
jakeboxer about 13 yearsSorry about this. I don't know why I was trying to obfuscate. I'll fix it in the morning.
-
Jim over 7 yearsok... so
foos.bars.last
results in a full index scan on bars... nice -_- -
Jim over 7 yearsok... so this results in a full index scan only if foos have 0 bars... still annoying though
-
Jim over 7 yearssolution:
foos.bars.last unless foos.bars.empty?
... still annoyed though -
Andrew Lazarus over 7 years@Jim What do you mean by
last
andempty
here: doesn't look like SQL. -
Jim over 7 yearsOh like the OP I hit the issue with an active_record (rails) generated Postgres query.
-
Andrew Lazarus over 7 yearsIf you can convert to SQL (and I suggest a fresh question), someone will look.
-
jwodder over 5 yearsAs of PostgreSQL 10, running the
ORDER BY
in a subquery no longer makes a difference. Instead, you have to use a CTE:WITH (... query ... ORDER BY ...) AS t: SELECT * FROM t LIMIT ...
. -
augustomen almost 5 yearsFor anyone having performance issues with Django, this is exactly what it does by default with the method
first()
:ORDER BY <pk> LIMIT 1
. -
Fabien Snauwaert over 2 years"perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query" made all the difference for me. It also feels cleaner. In contrast: adding an unneeded
ORDER BY
helped me on Postgres 9.6 but didn't do the trick after importing a dump on Postgres 13.