Count distinct and Null value is eliminated by an aggregate

sql sql-server sql-server-2005 null aggregate

28,883

Solution 1

select a,count(distinct isnull(b,-1))-sum(distinct case when b is null then 1 else 0 end),sum(a) from 
    (select 1 a,1 b union all
    select 2,2 union all
    select 2,null union all
    select 3,3 union all
    select 3,null union all
    select 3,null) a
    group by a

Thanks to Eoin I worked out a way to do this. You can count distinct the values including the nulls and then remove the count due to nulls if there were any using a sum distinct.

Solution 2

Anywhere you have a null possibly returned, use

CASE WHEN Column IS NULL THEN -1 ELSE Column END AS Column

That will sub out all your Null Values for -1 for the duration of the query and they'll be counted/aggregated as such, then you can just do the reverse in your fine wrapping query...

SELECT  
    CASE WHEN t1.a = -1 THEN NULL ELSE t1.a END as a
    , t1.countdistinctb
    , t2.suma

Solution 3

This is a late note, but being it was the return on Google, i wanted to mention it.

Changing NULL to another value is a Bad Idea(tm).

COUNT() is doing it, not DISTINCT.

Instead, use DISTINCT in an subquery and which returns a number, and aggregate that in the outer query.

A simple example of this is:

WITH A(A) AS (SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT 1)
SELECT COUNT(*) FROM (SELECT DISTINCT A FROM A) B;

This allows for COUNT(*) to be used, which does not ignore NULLs (because it counts records, not values).

Solution 4

If you don't like the code duplication then why not use a common table expression? e.g.

WITH x(a, b) AS 
        (
                select 1 a,1 b union all
                select 2,2 union all
                select 2,null union all
                select 3,3 union all
                select 3,null union all
                select 3,null
        ) 
select t1.a, t1.countdistinctb, t2.suma from
(
        select a,count(distinct b) countdistinctb from 
        x a
        where a.b is not null
        group by a
) t1
left join
(
        select a,sum(a) suma from 
        x a
        group by a
) t2 on t1.a=t2.a

View more solutions

28,883

Author by

Simon D

merge me

Updated on October 26, 2020

Comments

Simon D over 3 years

I'm using SQL Server 2005. With the query below (simplified from my real query):

select a,count(distinct b),sum(a) from 
(select 1 a,1 b union all
select 2,2 union all
select 2,null union all
select 3,3 union all
select 3,null union all
select 3,null) a
group by a

Is there any way to do a count distinct without getting

"Warning: Null value is eliminated by an aggregate or other SET operation."

Here are the alternatives I can think of:

Turning ANSI_WARNINGS off

Separating into two queries, one with count distinct and a where clause to eliminate nulls, one with the sum:

select t1.a, t1.countdistinctb, t2.suma from
(
    select a,count(distinct b) countdistinctb from 
    (
        select 1 a,1 b union all
        select 2,2 union all
        select 2,null union all
        select 3,3 union all
        select 3,null union all
        select 3,null
    ) a
    where a.b is not null
    group by a
) t1
left join
(
    select a,sum(a) suma from 
    (
        select 1 a,1 b union all
        select 2,2 union all
        select 2,null union all
        select 3,3 union all
        select 3,null union all
        select 3,null
    ) a
    group by a
) t2 on t1.a=t2.a

Ignore the warning in the client

Is there a better way to do this? I'll probably go down route 2, but don't like the code duplication.

Simon D about 15 years

I wanted to avoid the splitting into two queries and combining. Thanks to your idea I've worked it out though, I'll post an answer.
Simon D about 15 years

That's a good idea I hadn't thought of. But the code duplication I really didn't like was the group by and joins, which can't be got rid of like this. Thanks though.
adolf garlic about 15 years

Don't like this idea at all! What if the data is late arriving? Or 'unknown' is a perfectly valid state?
Simon D about 15 years

I'm specifically designing for a case where I don't want to include NULL values in the count.
Kristen over 5 years

If -1 would be a valid data value for [b] and you need to use an extreme value for IsNull() (e.g. MAX or MIN for the data type) which might cause overflow in SUM with this method then an alternative is count(distinct isnull(b, -32768)) - CASE WHEN sum(case when b is null then 1 else 0 end) = 0 THEN 0 ELSE 1 END