Efficient querying of multi-partition Postgres table

18,090

Solution 1

Have you tried Constraint Exclusion (section 5.9.4 in the document you've linked to)

Constraint exclusion is a query optimization technique that improves performance for partitioned tables defined in the fashion described above. As an example:

 SET constraint_exclusion = on; 
 SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01'; 

Without constraint exclusion, the above query would scan each of the partitions of the measurement table. With constraint exclusion enabled, the planner will examine the constraints of each partition and try to prove that the partition need not be scanned because it could not contain any rows meeting the query's WHERE clause. When the planner can prove this, it excludes the partition from the query plan.

You can use the EXPLAIN command to show the difference between a plan with constraint_exclusion on and a plan with it off.

Solution 2

I had a similar problem that I was able fix by casting conditions in WHERE. EG: (assuming the time_stamp column is timestamptz type)

WHERE time_stamp >= '2010-02-10'::timestamptz and time_stamp < '2010-02-11'::timestamptz

Also, make sure the CHECK condition on the table is defined the same way... EG: CHECK (time_stamp < '2010-02-10'::timestamptz)

Solution 3

I had the same problem and it boiled down to two reasons in my case:

  1. I had indexed column of type timestamp WITH time zone and partition constraint by this column with type timestamp WITHOUT time zone.

  2. After fixing constraints ANALYZE of all child tables was needed.

Edit: another bit of knowledge - it's important to remember that constraint exclusion (which allows PG to skip scanning some tables based on your partitioning criteria) doesn't work with, quote: non-immutable function such as CURRENT_TIMESTAMP

I had requests with CURRENT_DATE and it was part of my problem.

Share:
18,090

Related videos on Youtube

Adrian Pronk
Author by

Adrian Pronk

Updated on April 03, 2020

Comments

  • Adrian Pronk
    Adrian Pronk over 3 years

    I've just restructured my database to use partitioning in Postgres 8.2. Now I have a problem with query performance:

    SELECT *
    FROM my_table
    WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
    ORDER BY id DESC
    LIMIT 100;
    

    There are 45 million rows in the table. Prior to partitioning, this would use a reverse index scan and stop as soon as it hit the limit.

    After partitioning (on time_stamp ranges), Postgres does a full index scan of the master table and the relevant partition and merges the results, sorts them, then applies the limit. This takes way too long.

    I can fix it with:

    SELECT * FROM (
      SELECT *
      FROM my_table_part_a
      WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
      ORDER BY id DESC
      LIMIT 100) t
    UNION ALL
    SELECT * FROM (
      SELECT *
      FROM my_table_part_b
      WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
      ORDER BY id DESC
      LIMIT 100) t
    UNION ALL
      ... and so on ...
    ORDER BY id DESC
    LIMIT 100
    

    This runs quickly. The partitions where the times-stamps are out-of-range aren't even included in the query plan.

    My question is: Is there some hint or syntax I can use in Postgres 8.2 to prevent the query-planner from scanning the full table but still using simple syntax that only refers to the master table?

    Basically, can I avoid the pain of dynamically building the big UNION query over each partition that happens to be currently defined?

    EDIT: I have constraint_exclusion enabled (thanks @Vinko Vrsalovic)

  • Adrian Pronk
    Adrian Pronk almost 14 years
    Yes, I have constraint-exclusion switched on. Unfortunately, the master table (which is always empty) is always included in the query as it's not possible to apply a CHECK constraint to it (at least in 8.2). This means there's always at least two tables involved in the query

Related