SQL: count number of distinct values in every column
Solution 1
I appreciate all of the responses. I think the solution that will work best for me in this situation (counting the number of distinct values in each column of a table from an external program that has no knowledge of the table except its name) is as follows:
Run "describe table1" and pull out the column names from the result.
Loop through the column names and create the query to count the distinct values in each column. The query will look something like "select count(distinct columnA), count(distinct columnB), ... from table1".
Solution 2
This code should give you all the columns in 'table1' with the respective distinct value count for each one as data.
DECLARE @TableName VarChar (Max) = 'table1'
DECLARE @SqlString VarChar (Max)
set @SqlString = (
SELECT DISTINCT
'SELECT ' +
RIGHT (ColumnList, LEN (ColumnList)-1) +
' FROM ' + Table_Name
FROM INFORMATION_SCHEMA.COLUMNS COL1
CROSS AppLy (
SELECT ', COUNT (DISTINCT [' + COLUMN_NAME + ']) AS ' + '''' + COLUMN_NAME + ''''
FROM INFORMATION_SCHEMA.COLUMNS COL2
WHERE COL1.TABLE_NAME = COL2.TABLE_NAME
FOR XML PATH ('')
) TableColumns (ColumnList)
WHERE
1=1 AND
COL1.TABLE_NAME = @TableName
)
EXECUTE (@SqlString)
Solution 3
try this (sql server 2005 syntax):
DECLARE @YourTable table (col1 varchar(5)
,col2 int
,col3 datetime
,col4 char(3)
)
insert into @YourTable values ('abcdf',123,'1/1/2009','aaa')
insert into @YourTable values ('aaaaa',456,'1/2/2009','bbb')
insert into @YourTable values ('bbbbb',789,'1/3/2009','aaa')
insert into @YourTable values ('ccccc',789,'1/4/2009','bbb')
insert into @YourTable values ('aaaaa',789,'1/5/2009','aaa')
insert into @YourTable values ('abcdf',789,'1/6/2009','aaa')
;with RankedYourTable AS
(
SELECT
ROW_NUMBER() OVER(PARTITION by col1 order by col1) AS col1Rank
,ROW_NUMBER() OVER(PARTITION by col2 order by col2) AS col2Rank
,ROW_NUMBER() OVER(PARTITION by col3 order by col3) AS col3Rank
,ROW_NUMBER() OVER(PARTITION by col4 order by col4) AS col4Rank
FROM @YourTable
)
SELECT
SUM(CASE WHEN col1Rank=1 THEN 1 ELSE 0 END) AS col1DistinctCount
,SUM(CASE WHEN col2Rank=1 THEN 1 ELSE 0 END) AS col2DistinctCount
,SUM(CASE WHEN col3Rank=1 THEN 1 ELSE 0 END) AS col3DistinctCount
,SUM(CASE WHEN col4Rank=1 THEN 1 ELSE 0 END) AS col4DistinctCount
FROM RankedYourTable
OUTPUT:
col1DistinctCount col2DistinctCount col3DistinctCount col4DistinctCount
----------------- ----------------- ----------------- -----------------
4 3 6 2
(1 row(s) affected)
Solution 4
and it's hardcoded.
It is not hardcoding to provide a field list for a sql statement. It's common and acceptable practice.
Ryan
I'm currently a grad student in computer science. My main interests are computer graphics and operating systems. In the past, I've done work in the environmental field on various models. I've also written benchmarks for distributed file systems. Of course, there have been a lot of miscellaneous projects/experiences mixed in. At the moment, I'm trying to learn more about photo-realistic rendering (OpenGL/GLSL) and character animation.
Updated on June 27, 2020Comments
-
Ryan almost 4 years
I need a query that will return a table where each column is the count of distinct values in the columns of another table.
I know how to count the distinct values in one column:
select count(distinct columnA) from table1;
I suppose that I could just make this a really long select clause:
select count(distinct columnA), count(distinct columnB), ... from table1;
but that isn't very elegant and it's hardcoded. I'd prefer something more flexible.
-
ijw over 14 years...As is programmatically creating the SQL, as far as it goes (providing your users never supply the values you put in it - fine in this question, you have the column list somewhere).
-
Amy B over 14 yearsif I was going to code-gen some SQL, and if I was on MSSqlServer, I'd check out sysobjects and syscolumns.
-
Kibbee over 14 yearsPlease qualify this with more information. How is using distinct evil when compred to doing COUNT/GROUP BY?
-
Kibbee over 14 yearsIf you want to write the code in a more portable way, you should select from Information_Schema.Tables and Information_Schema.Columns, rather than selecting from sysobjects and syscolumns
-
Cristian Cotovan over 14 yearsDISTINCT behaves erratically with larger datasets and from platform to platform. At least in my experience. I find grouping results to be more predictable, particularly if you deal with differently encoded data, UTF, etc.
-
Ryan over 14 yearsGood point. I shouldn't have to worry about this. The fields will only be text or numbers.
-
Ryan over 14 yearsI'll have to look into using group by.
-
Shannon Severance over 14 yearscount/group by would only get distict count for a single column. For columns A and B you would end up with two selects, because select A, B, count(*) from ... group by A, B would give you counts of the distinct pair (A, B), not distinct A and distinct B. The OP is on the right track with count(distinct A), count(distinct B)
-
Ludovic Aubert almost 5 yearsSimple query to help generate yours painlessly:SELECT 'ROW_NUMBER() OVER(PARTITION BY ' + COLUMN_NAME + ' ORDER BY ' + COLUMN_NAME + ') AS ' + COLUMN_NAME + 'Rank,', 'SUM(CASE WHEN ' + COLUMN_NAME + 'Rank=1 THEN 1 ELSE 0 END) AS ' + COLUMN_NAME + 'DistinctCount,' FROM information_schema.COLUMNS WHERE TABLE_NAME='YourTable';