How to do a Postgresql subquery in select clause with join in from clause like SQL Server?

346,384

Solution 1

I'm not sure I understand your intent perfectly, but perhaps the following would be close to what you want:

select n1.name, n1.author_id, count_1, total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
              from names
              group by id, author_id) n2
  on (n2.id = n1.id and n2.author_id = n1.author_id)

Unfortunately this adds the requirement of grouping the first subquery by id as well as name and author_id, which I don't think was wanted. I'm not sure how to work around that, though, as you need to have id available to join in the second subquery. Perhaps someone else will come up with a better solution.

Solution 2

Complementing @Bob Jarvis and @dmikam answer, Postgres don't perform a good plan when you don't use LATERAL, below a simulation, in both cases the query data results are the same, but the cost are very different

Table structure

CREATE TABLE ITEMS (
    N INTEGER NOT NULL,
    S TEXT NOT NULL
);

INSERT INTO ITEMS
  SELECT
    (random()*1000000)::integer AS n,
    md5(random()::text) AS s
  FROM
    generate_series(1,1000000);

CREATE INDEX N_INDEX ON ITEMS(N);

Performing JOIN with GROUP BY in subquery without LATERAL

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);

The results

Merge Join  (cost=0.87..637500.40 rows=23 width=37)
  Merge Cond: (i.n = items.n)
  ->  Index Scan using n_index on items i  (cost=0.43..101.28 rows=23 width=37)
        Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..626631.11 rows=861418 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..593016.93 rows=10000000 width=4)

Using LATERAL

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN LATERAL (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    WHERE N = I.N
    GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);

Results

Nested Loop  (cost=9.49..1319.97 rows=276 width=37)
  ->  Bitmap Heap Scan on items i  (cost=9.06..100.20 rows=23 width=37)
        Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
        ->  Bitmap Index Scan on n_index  (cost=0.00..9.05 rows=23 width=0)
              Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..52.79 rows=12 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..52.64 rows=12 width=4)
              Index Cond: (n = i.n)

My Postgres version is PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)

Solution 3

I know this is old, but since Postgresql 9.3 there is an option to use a keyword "LATERAL" to use RELATED subqueries inside of JOINS, so the query from the question would look like:

SELECT 
    name, author_id, count(*), t.total
FROM
    names as n1
    INNER JOIN LATERAL (
        SELECT 
            count(*) as total
        FROM 
            names as n2
        WHERE 
            n2.id = n1.id
            AND n2.author_id = n1.author_id
    ) as t ON 1=1
GROUP BY 
    n1.name, n1.author_id

Solution 4

I am just answering here with the formatted version of the final sql I needed based on Bob Jarvis answer as posted in my comment above:

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
              from names
              group by author_id) n2
  on (n2.author_id = n1.author_id)

Solution 5

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select distinct(author_id), count(1) as total_count
              from names) n2
  on (n2.author_id = n1.author_id)
Where true

used distinct if more inner join, because more join group performance is slow

Share:
346,384

Related videos on Youtube

Ricardo
Author by

Ricardo

Updated on July 08, 2022

Comments

  • Ricardo
    Ricardo almost 2 years

    I am trying to write the following query on postgresql:

    select name, author_id, count(1), 
        (select count(1)
        from names as n2
        where n2.id = n1.id
            and t2.author_id = t1.author_id
        )               
    from names as n1
    group by name, author_id
    

    This would certainly work on Microsoft SQL Server but it does not at all on postegresql. I read its documentation a bit and it seems I could rewrite it as:

    select name, author_id, count(1), total                     
    from names as n1, (select count(1) as total
        from names as n2
        where n2.id = n1.id
            and n2.author_id = t1.author_id
        ) as total
    group by name, author_id
    

    But that returns the following error on postegresql: "subquery in FROM cannot refer to other relations of same query level". So I'm stuck. Does anyone know how I can achieve that?

    Thanks

    • qwertzguy
      qwertzguy almost 8 years
      Actually it seems like this should work on Postgres (maybe 6 years ago it didn't :) )
  • deFreitas
    deFreitas about 6 years
    I wonder if the performance of these two queries have difference, or if for postgresql it is the same plan
  • deFreitas
    deFreitas about 6 years
    I did this test and the answer is here (my answer)
  • leole
    leole almost 5 years
    Thanks for the hint to using LATERAL!!