In Redshift/Postgres, how to count rows that meet a condition?

72,893

Solution 1

First, the issue you're having here is that what you're saying is "If the grade is less than 70, the value of this case expression is count(rank). Otherwise, the value of this expression is count(rank)." So, in either case, you're always getting the same value.

SELECT 
    CASE
        WHEN grade < 70 THEN COUNT(rank)
        ELSE COUNT(rank)
    END
FROM
   grades

count() only counts non-null values, so typically the pattern you'll see to accomplish what you're trying is this:

SELECT 
    count(CASE WHEN grade < 70 THEN 1 END) as grade_less_than_70,
    count(CASE WHEN grade >= 70 and grade < 80 THEN 1 END) as grade_between_70_and_80
FROM
   grades

That way the case expression will only evaluate to 1 when the test expression is true and will be null otherwise. Then the count() will only count the non-null instances, i.e. when the test expression is true, which should give you what you need.

Edit: As a side note, notice that this is exactly the same as how you had originally written this using count(if(test, true-value, false-value)), only re-written as count(case when test then true-value end) (and null is the stand in false-value since an else wasn't supplied to the case).

Edit: postgres 9.4 was released a few months after this original exchange. That version introduced aggregate filters, which can make scenarios like this look a little nicer and clearer. This answer still gets some occasional upvotes, so if you've stumbled upon here and are using a newer postgres (i.e. 9.4+) you might want to consider this equivalent version:

SELECT
    count(*) filter (where grade < 70) as grade_less_than_70,
    count(*) filter (where grade >= 70 and grade < 80) as grade_between_70_and_80
FROM
   grades

Solution 2

The solution given by @yieldsfalsehood works perfectly:

SELECT
    count(*) filter (where grade < 70) as grade_less_than_70,
    count(*) filter (where grade >= 70 and grade < 80) as grade_between_70_and_80
FROM
    grades

But since you talked about NULLIF(value1, value2), there's a way with nullif that can give the same result:

select count(nullif(grade < 70 ,true)) as failed from grades;

Solution 3

Redshift only

For lazy typers, here's a "COUNTIF" sum integer casting version built on top of @user1509107 answer:

SELECT 
    SUM((grade < 70)::INT) AS grade_less_than_70,
    SUM((grade >= 70 AND grade < 80)::INT) AS grade_between_70_and_80
FROM
   grades
Share:
72,893
ILikeTacos
Author by

ILikeTacos

Experienced Software Engineer with extensive background in Web Applications Development using LAMP and MEAN stacks.

Updated on July 26, 2022

Comments

  • ILikeTacos
    ILikeTacos almost 2 years

    I'm trying to write a query that count only the rows that meet a condition.

    For example, in MySQL I would write it like this:

    SELECT
        COUNT(IF(grade < 70), 1, NULL)
    FROM
        grades
    ORDER BY
        id DESC;
    

    However, when I attempt to do that on Redshift, it returns the following error:

    ERROR: function if(boolean, integer, "unknown") does not exist

    Hint: No function matches the given name and argument types. You may need to add explicit type casts.

    I checked the documentation for conditional statements, and I found

    NULLIF(value1, value2)

    but it only compares value1 and value2 and if such values are equal, it returns null.

    I couldn't find a simple IF statement, and at first glance I couldn't find a way to do what I want to do.

    I tried to use the CASE expression, but I'm not getting the results I want:

    SELECT 
        CASE
            WHEN grade < 70 THEN COUNT(rank)
            ELSE COUNT(rank)
        END
    FROM
       grades
    

    This is the way I want to count things:

    • failed (grade < 70)

    • average (70 <= grade < 80)

    • good (80 <= grade < 90)

    • excellent (90 <= grade <= 100)

    and this is how I expect to see the results:

    +========+=========+======+===========+
    | failed | average | good | excellent |
    +========+=========+======+===========+
    |   4    |    2    |  1   |     4     |
    +========+=========+======+===========+
    

    but I'm getting this:

    +========+=========+======+===========+
    | failed | average | good | excellent |
    +========+=========+======+===========+
    |  11    |   11    |  11  |    11     |
    +========+=========+======+===========+
    

    I hope someone could point me to the right direction!

    If this helps here's some sample info

    CREATE TABLE grades(
      grade integer DEFAULT 0,
    );
    
    INSERT INTO grades(grade) VALUES(69, 50, 55, 60, 75, 70, 87, 100, 100, 98, 94);
    
  • ILikeTacos
    ILikeTacos over 10 years
    awesome! This is exactly what I needed. Although I find a bit weird that postgres doesn't provide a simple IF instruction. I guess CASE..WHEN..ELSE does the same job, but still. IF is almost industry standard.
  • yieldsfalsehood
    yieldsfalsehood over 10 years
    Case is the official standard ;) The if() function is MySQL, which is iif() in SQL Server.
  • Zilk
    Zilk almost 7 years
  • Davos
    Davos almost 5 years
    Interesting answer, definitely harder to read and understand, and relies on specific behaviour of boolean true casting to 1 and false to 0. I suspect the type casting is not as performant as the literal math case statement expression but would interested to see some benchmarks. Still useful to add to the list of answers
  • Davos
    Davos almost 5 years
    Here are some benchmarks, the best option in postgres is the relatively new filter expression dba.stackexchange.com/questions/27558/…
  • Davos
    Davos almost 5 years
    The filter is also apparently faster than the other options see dba.stackexchange.com/questions/27558/…