Count distinct values in MySQL without folding NULL values

10,185

Solution 1

Use NOT EXISTS in a WHERE clause:

SELECT count(*)
FROM   a
WHERE  NOT EXISTS (SELECT * FROM a a2 WHERE a2.f = a.f AND a2.id < a.id);

This way you can also get actual rows - if you need more than the bare count:

SELECT *
FROM   a
WHERE  NOT EXISTS (SELECT * FROM a a2 WHERE a2.f = a.f AND a2.id < a.id)

The = operator makes sure that all rows with f IS NULL are included. You had that in your query already.

-> sqlfiddle

Neither of these would work:

SELECT DISTINCT f FROM a;

SELECT * FROM a GROUP BY f;

.. because both would also fold NULL values, and you want

every record WHERE f IS NULL.

Solution 2

There's a pretty simple approach that might work for you:

select count(distinct ifnull(f, id))
from a

Note that this query assumes that f values are never id values, and based on sample data and experience this is reasonable.

Edited:

I thought about it and there's an even simpler approach:

select count(distinct f) + sum(f is null) from a;

which you can see running on sqlfiddle

This works because distinct throws away nulls, and sum(condition) counts the number of times condition is true because in mysql true is 1 and false is 0.

Share:
10,185
package
Author by

package

Updated on June 09, 2022

Comments

  • package
    package almost 2 years

    I have table a with two fields: id (PK) and f.

    Consider following records:

    id | f
    1  | NULL
    2  | 'foo'
    3  | 'bar'
    4  | NULL
    5  | 'foo'
    6  | 'baz'
    

    I want to retrieve and count all the records having distinct f values including every record WHERE f IS NULL. Given this criteria, the query should return every record except #5, because the same value is already included in the set, and the total count would be 5.

    The query I'm using to retrieve all records looks like this:

    SELECT CASE WHEN EXISTS (SELECT id FROM a a2 WHERE a2.f = a.f AND a.id < a2.id) THEN 1 END AS not_distinct FROM a HAVING not_distinct IS NULL

    If this query could be improved, I'd welcome any feedback. Anyway, the main problem is counting. Obviously adding a COUNT(*) will not help here and I'm totally lost how to count the records after the filtering.