Conditional SQL count

sql postgresql count group-by aggregate-filter

40,210

Solution 1

In Postgres 9.4 or later, use the aggregate FILTER option. Typically cleanest and fastest:

SELECT category
     , count(*) FILTER (WHERE question1 = 0) AS zero
     , count(*) FILTER (WHERE question1 = 1) AS one
     , count(*) FILTER (WHERE question1 = 2) AS two
FROM   reviews
GROUP  BY 1;

Details for the FILTER clause:

Aggregate columns with additional (distinct) filters

If you want it short:

SELECT category
     , count(question1 = 0 OR NULL) AS zero
     , count(question1 = 1 OR NULL) AS one
     , count(question1 = 2 OR NULL) AS two
FROM   reviews
GROUP  BY 1;

More syntax variants:

For absolute performance, is SUM faster or COUNT?

Proper crosstab query

crosstab() yields the best performance and is shorter for long lists of options:

SELECT * FROM crosstab(
     'SELECT category, question1, count(*) AS ct
      FROM   reviews
      GROUP  BY 1, 2
      ORDER  BY 1, 2'
   , 'VALUES (0), (1), (2)'
   ) AS ct (category text, zero int, one int, two int);

Detailed explanation:

PostgreSQL Crosstab Query

Solution 2

The "best" way (for me) is to write a query like:

SELECT
    category,
    question1,
    count(*)
FROM reviews
GROUP BY category, question1

Then I use this data to draw a table in application logic.

Other option is to use one JSON column for all grouping results. This will result in something like:

category1 | {"zero": 1, "one": 3, "two": 5}
category2 | {"one": 7, "two": 4}

and so on.

The query for this option you can build from the previous one with json_build_object and json_agg. The best thing for this option - you do not need to know number of possible question1 values ahead of time.

40,210

Erwin Brandstetter

Database expert. Warmongers make peace before asking questions.

Updated on October 24, 2021

Comments

Erwin Brandstetter over 2 years
What is the best way to create columns which count the number of occurrences of data in a table? The table needs to be grouped by one column?
My database is PostgreSQL.

I have seen:
```
SELECT
    sum(CASE WHEN question1 = 0 THEN 1 ELSE 0 END) AS ZERO,
    sum(CASE WHEN question1 = 1 THEN 1 ELSE 0 END) AS ONE,
    sum(CASE WHEN question1 = 2 THEN 1 ELSE 0 END) AS TWO,
    category
FROM reviews
    GROUP BY category
```
where question1 can have a value of either 0, 1 or 2.

I have also seen a version of that using count(CASE WHEN question1 = 0 THEN 1)

However, this becomes more cumbersome to write as the number of possible values for question1 increases. Is there a convenient way to write this query, possibly optimizing performance?
- Gordon Linoff about 9 years
  
  Your version is quite reasonable, although you should include the end if you want it to be syntactically correct. An alternative is Postgres's crosstab functionality. I don't know which is better performance-wise.
- Admin about 9 years
  
  sorry, forgot to type the end
- Giorgi Nakeuri about 9 years
  
  why not to group by category and question1 and count rows in groups?
- Ram about 9 years
  
  @GiorgiNakeuri The OP wants columns names for each value of question1 then you need to transpose the result for the required result.
- Giorgi Nakeuri about 9 years
  
  @Ram, aha, ok I see know. In MSSQL it would be group by category and question and then pivoting grouping results.
- Ram about 9 years
  
  @GiorgiNakeuri True. Pivot is what I thought of first but this is postgresql.