Postgres query optimization (forcing an index scan)

57,779

Solution 1

For testing purposes you can force the use of the index by "disabling" sequential scans - best in your current session only:

SET enable_seqscan = OFF;

Do not use this on a productive server. Details in the manual here.

I quoted "disabling", because you cannot actually disable sequential table scans. But any other available option is now preferable for Postgres. This will prove that the multicolumn index on (metric_id, t) can be used - just not as effective as an index on the leading column.

You probably get better results by switching the order of columns in your PRIMARY KEY (and the index used to implement it behind the curtains with it) to (t, metric_id). Or create an additional index with reversed columns like that.

You do not normally have to force better query plans by manual intervention. If setting enable_seqscan = OFF leads to a much better plan, something is probably not right in your database. Consider this related answer:

Solution 2

You cannot force index scan in this case because it will not make it faster.

You currently have index on metric_data (metric_id, t), but server cannot take advantage of this index for your query, because it needs to be able to discriminate by metric_data.t only (without metric_id), but there is no such index. Server can use sub-fields in compound indexes, but only starting from the beginning. For example, searching by metric_id will be able to employ this index.

If you create another index on metric_data (t), your query will make use of that index and will work much faster.

Also, you should make sure that you have an index on metrics (id).

Solution 3

Have you tried to use:

WHERE S.NAME = ANY (VALUES ('cpu'), ('mem')) instead of ARRAY

like here

Share:
57,779

Related videos on Youtube

Jeff
Author by

Jeff

Updated on April 21, 2021

Comments

  • Jeff
    Jeff about 3 years

    Below is my query. I am trying to get it to use an index scan, but it will only seq scan.

    By the way the metric_data table has 130 million rows. The metrics table has about 2000 rows.

    metric_data table columns:

      metric_id integer
    , t timestamp
    , d double precision
    , PRIMARY KEY (metric_id, t)
    

    How can I get this query to use my PRIMARY KEY index?

    SELECT
        S.metric,
        D.t,
        D.d
    FROM metric_data D
    INNER JOIN metrics S
        ON S.id = D.metric_id
    WHERE S.NAME = ANY (ARRAY ['cpu', 'mem'])
      AND D.t BETWEEN '2012-02-05 00:00:00'::TIMESTAMP
                  AND '2012-05-05 00:00:00'::TIMESTAMP;
    

    EXPLAIN:

    Hash Join  (cost=271.30..3866384.25 rows=294973 width=25)
      Hash Cond: (d.metric_id = s.id)
      ->  Seq Scan on metric_data d  (cost=0.00..3753150.28 rows=29336784 width=20)
            Filter: ((t >= '2012-02-05 00:00:00'::timestamp without time zone)
                 AND (t <= '2012-05-05 00:00:00'::timestamp without time zone))
      ->  Hash  (cost=270.44..270.44 rows=68 width=13)
            ->  Seq Scan on metrics s  (cost=0.00..270.44 rows=68 width=13)
                  Filter: ((sym)::text = ANY ('{cpu,mem}'::text[]))
    
  • Erwin Brandstetter
    Erwin Brandstetter over 11 years
    This is not quite correct. A multi-column index can be used on the second field alone, too. Even though not as effective. Consider this related question on dba.SE.
  • Jeff
    Jeff over 11 years
    Setting this flag made that query above run in 150ms compared to 45secs on my machine. Thanks!
  • klin
    klin over 11 years
    Very instructive answer. And incredible results.
  • Erwin Brandstetter
    Erwin Brandstetter over 11 years
    @Jeff: I added another hint to my answer.
  • ngu
    ngu about 8 years
    Thanks for your insights. It should be enable_seqscan = OFF instead of enable_seq_scan = OFF in the last sentence
  • Erwin Brandstetter
    Erwin Brandstetter about 8 years
    @muluhumu: Thanks, fixed.