Extremely slow PostgreSQL query with ORDER and LIMIT clauses

28,021

Solution 1

When you have both the LIMIT and ORDER BY, the optimizer has decided it is faster to limp through the unfiltered records on foo by key descending until it gets five matches for the rest of the criteria. In the other cases, it simply runs the query as a nested loop and returns all the records.

Offhand, I'd say the problem is that PG doesn't grok the joint distribution of the various ids and that's why the plan is so sub-optimal.

For possible solutions: I'll assume that you have run ANALYZE recently. If not, do so. That may explain why your estimated times are high even on the version that returns fast. If the problem persists, perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query.

Solution 2

Probably it happens because before it tries to order then to select. Why do not try to sort the result in an outer select all? Something like: SELECT * FROM (SELECT ... INNER JOIN ETC...) ORDER BY ... DESC

Solution 3

Your query plan indicates a filter on

(((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))

which doesn't appear in the SELECT - where is it coming from?

Also, note that expression is listed as a "Filter" and not an "Index Cond" which would seem to indicate there's no index applied to it.

Share:
28,021
jakeboxer
Author by

jakeboxer

Jake Boxer is a software engineer living in San Francisco, CA and working at GitHub. He graduated summa cum laude from UMass Amherst with a Bachelor’s degree in Computer Science. He works at GitHub developing Ruby on Rails and iPhone apps. He is 25 years old. He feels strange about writing in the third person, and is going to stop now.

Updated on July 09, 2022

Comments

  • jakeboxer
    jakeboxer almost 2 years

    I have a table, let's call it "foos", with almost 6 million records in it. I am running the following query:

    SELECT "foos".*
    FROM "foos"
    INNER JOIN "bars" ON "foos".bar_id = "bars".id
    WHERE (("bars".baz_id = 13266))
    ORDER BY "foos"."id" DESC
    LIMIT 5 OFFSET 0;
    

    This query takes a very long time to run (Rails times out while running it). There is an index on all IDs in question. The curious part is, if I remove either the ORDER BY clause or the LIMIT clause, it runs almost instantaneously.

    I'm assuming that the presence of both ORDER BY and LIMIT are making PostgreSQL make some bad choices in query planning. Anyone have any ideas on how to fix this?

    In case it helps, here is the EXPLAIN for all 3 cases:

    //////// Both ORDER and LIMIT
    SELECT "foos".*
    FROM "foos"
    INNER JOIN "bars" ON "foos".bar_id = "bars".id
    WHERE (("bars".baz_id = 13266))
    ORDER BY "foos"."id" DESC
    LIMIT 5 OFFSET 0;
                                                         QUERY PLAN                                                     
    --------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..16663.44 rows=5 width=663)
       ->  Nested Loop  (cost=0.00..25355084.05 rows=7608 width=663)
             Join Filter: (foos.bar_id = bars.id)
             ->  Index Scan Backward using foos_pkey on foos  (cost=0.00..11804133.33 rows=4963477 width=663)
                   Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))
             ->  Materialize  (cost=0.00..658.96 rows=182 width=4)
                   ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)
                         Index Cond: (baz_id = 13266)
    (8 rows)
    
    //////// Just LIMIT
    SELECT "foos".*
    FROM "foos"
    INNER JOIN "bars" ON "foos".bar_id = "bars".id
    WHERE (("bars".baz_id = 13266))
    LIMIT 5 OFFSET 0;
                                                                  QUERY PLAN                                                               
    ---------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..22.21 rows=5 width=663)
       ->  Nested Loop  (cost=0.00..33788.21 rows=7608 width=663)
             ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)
                   Index Cond: (baz_id = 13266)
             ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.00..181.51 rows=42 width=663)
                   Index Cond: (foos.bar_id = bars.id)
                   Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text))
    (7 rows)
    
    //////// Just ORDER
    SELECT "foos".*
    FROM "foos"
    INNER JOIN "bars" ON "foos".bar_id = "bars".id
    WHERE (("bars".baz_id = 13266))
    ORDER BY "foos"."id" DESC;
                                                                  QUERY PLAN                                                               
    ---------------------------------------------------------------------------------------------------------------------------------------
     Sort  (cost=36515.17..36534.19 rows=7608 width=663)
       Sort Key: foos.id
       ->  Nested Loop  (cost=0.00..33788.21 rows=7608 width=663)
             ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)
                   Index Cond: (baz_id = 13266)
             ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.00..181.51 rows=42 width=663)
                   Index Cond: (foos.bar_id = bars.id)
                   Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text))
    (8 rows)
    
  • jakeboxer
    jakeboxer about 13 years
    Sorry about this. I don't know why I was trying to obfuscate. I'll fix it in the morning.
  • Jim
    Jim over 7 years
    ok... so foos.bars.last results in a full index scan on bars... nice -_-
  • Jim
    Jim over 7 years
    ok... so this results in a full index scan only if foos have 0 bars... still annoying though
  • Jim
    Jim over 7 years
    solution: foos.bars.last unless foos.bars.empty?... still annoyed though
  • Andrew Lazarus
    Andrew Lazarus over 7 years
    @Jim What do you mean by last and empty here: doesn't look like SQL.
  • Jim
    Jim over 7 years
    Oh like the OP I hit the issue with an active_record (rails) generated Postgres query.
  • Andrew Lazarus
    Andrew Lazarus over 7 years
    If you can convert to SQL (and I suggest a fresh question), someone will look.
  • jwodder
    jwodder over 5 years
    As of PostgreSQL 10, running the ORDER BY in a subquery no longer makes a difference. Instead, you have to use a CTE: WITH (... query ... ORDER BY ...) AS t: SELECT * FROM t LIMIT ....
  • augustomen
    augustomen almost 5 years
    For anyone having performance issues with Django, this is exactly what it does by default with the method first(): ORDER BY <pk> LIMIT 1.
  • Fabien Snauwaert
    Fabien Snauwaert over 2 years
    "perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query" made all the difference for me. It also feels cleaner. In contrast: adding an unneeded ORDER BY helped me on Postgres 9.6 but didn't do the trick after importing a dump on Postgres 13.