PostgreSQL slow JOIN with CASE statement

14,615

Test this one:

select *
from
    some_table as t1
    join
    some_table as t2 on
        t1.type = t2.type
        and
        (
            t1.type = 'ab' and t1.first = t2.first
            or
            t1.type = 'cd' and t1.second = t2.second
        )

For a better performance create an index based on a function:

create or replace function f (_type text, _first int, _second int)
returns integer as $$
    select case _type when 'ab' then _first else _second end;
$$ language sql immutable;

create index i on some_table(f(type, first, second));

Use that index on the query:

select *
from
    some_table as t1
    join
    some_table as t2 on
        t1.type = t2.type
        and
        f(t1.type, t1.first, t1.second) = f(t1.type, t2.first, t2.second)
Share:
14,615
legacy
Author by

legacy

Updated on June 04, 2022

Comments

  • legacy
    legacy almost 2 years

    In my database I have a table that contains ~3500 records and as a part of more complicated query I've tried to perform inner join on itself using "CASE" condition just as you can see below.

    SELECT *
    FROM some_table AS t1
    JOIN some_table AS t2 ON t1.type = t2.type
        AND CASE
           WHEN t1.type = 'ab' THEN t1.first = t2.first
           WHEN t1.type = 'cd' THEN t1.second = t2.second
           -- Column type contains only one of 2 possible varchar values
        END;
    

    The problem is this query is performed for 3.2 - 4.5 seconds while next request is performed in 40 - 50 milliseconds.

    SELECT *
    FROM some_table AS t1
    JOIN some_table AS t2 ON t1.type = t2.type
        AND (t1.first = t2.first OR t1.second = t2.second)
    

    Also according to the execution plan in first case database processes ~5.8 millions of records while table contains only ~3500. There are next indexes on this table: (id), (type), (type, first), (type, second).

    We are using next version: PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120 313 (Red Hat 4.4.7-16), 64-bit

    Any ideas why PostgreSQL works so weird in this case?

  • legacy
    legacy over 7 years
    It works just fine, thanks a lot. No ideas why query with CASE performs so slowly?
  • Pavel Stehule
    Pavel Stehule over 7 years
    @legacy - the CASE in JOIN condition is pretty big trap for optimizer - probably Postgres's planner selected nested loop based plan - and your example is Cartesian product 3500*3500 rows, and you have to evaluate CASE expression for every combination - so 5sec is pretty fine time for pretty unhappy query.