Aggregate by aggregate (ARRAY_AGG)?

12,567

Solution 1

I'm not sure if you can aggregate by an array. If you can here is one approach:

select col1, array_agg(col2), ar
from (SELECT a1.column_1 as col1, a2.column_1 as col2,
             ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2) as ar
      FROM agg_test a1 JOIN
           agg_test a2
           ON a1.column_2 = a2.column_2 AND a1.column_1 <> a2.column_1
      WHERE a1.column_1 = 1
      GROUP BY a1.column_1, a2.column_1
     ) t
group by col1, ar

The alternative is to use array_dims to convert the array values into a string.

Solution 2

You could also try something like this:

SELECT DISTINCT
  a1.column_1,
  ARRAY_AGG(a2.column_1) OVER (
    PARTITION BY
      a1.column_1,
      ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
  ) AS "a2.column_1 agg",
  ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
FROM agg_test a1
JOIN agg_test a2 ON a1.column_2 = a2.column_2 AND a1.column_1  a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1
;

(Highlighted are the parts that are different from the query you've posted in your question.)

The above uses a window ARRAY_AGG to combine the values of a2.column_1 alongside the other other ARRAY_AGG, using the latter's result as one of the partitioning criteria. Without the DISTINCT, it would produce two {4,5} rows for your example. So, DISTINCT is needed to eliminate the duplicates.

Here's a SQL Fiddle demo: http://sqlfiddle.com/#!1/df5c3/4

Note, though, that the window ARRAY_AGG cannot have an ORDER BY like it's "normal" counterpart. That means the order of a2.column_1 values in the list would be indeterminate, although in the linked demo it does happen to match the one in your expected output.

Share:
12,567
Przemek
Author by

Przemek

php, javascript / jQuery developer ios / objective-c in near future ;)

Updated on June 17, 2022

Comments

  • Przemek
    Przemek about 2 years

    Let's say I have a simple table agg_test with 3 columns - id, column_1 and column_2. Dataset, for example:

    id|column_1|column_2
    --------------------
     1|       1|       1
     2|       1|       2
     3|       1|       3
     4|       1|       4
     5|       2|       1
     6|       3|       2
     7|       4|       3
     8|       4|       4
     9|       5|       3
    10|       5|       4 
    

    A query like this (with self join):

    SELECT
      a1.column_1,
      a2.column_1,
      ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
    FROM agg_test a1
    JOIN agg_test a2 ON a1.column_2 = a2.column_2 AND a1.column_1 <> a2.column_1
    WHERE a1.column_1 = 1
    GROUP BY a1.column_1, a2.column_1
    

    Will produce a result like this:

    column_1|column_1|array_agg
    ---------------------------
           1|       2|      {1}
           1|       3|      {2}
           1|       4|    {3,4}
           1|       5|    {3,4}
    

    We can see that for values 4 and 5 from the joined table we have the same result in the last column. So, is it possible to somehow group the results by it, e.g:

    column_1|column_1|array_agg
    ---------------------------
           1|     {2}|      {1}
           1|     {3}|      {2}
           1|   {4,5}|    {3,4}
    

    Thanks for any answers. If anything isn't clear or can be presented in a better way - tell me in the comments and I'll try to make this question as readable as I can.