How to find out whether a table has some unique columns

19,303

Solution 1

Here's an approach that is basically similar to @JNK's but instead of printing the counts it returns a ready answer for every column that tells you whether a column consists of unique values only or not:

DECLARE @table varchar(100), @sql varchar(max);
SET @table = 'some table name';

SELECT
  @sql = COALESCE(@sql + ', ', '') + ColumnExpression
FROM (
  SELECT
    ColumnExpression =
      'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
      'WHEN COUNT(*) THEN ''UNIQUE'' ' +
      'ELSE '''' ' +
      'END AS ' + COLUMN_NAME
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_NAME = @table
) s

SET @sql = 'SELECT ' + @sql + ' FROM ' + @table;
PRINT @sql;  /* in case you want to have a look at the resulting query */
EXEC(@sql);

It simply compares COUNT(DISTINCT column) with COUNT(*) for every column. The result will be a table with a single row, where every column will contain the value UNIQUE for those columns that do not have duplicates, and empty string if duplicates are present.

But the above solution will work correctly only for those columns that do not have NULLs. It should be noted that SQL Server does not ignore NULLs when you want to create a unique constraint/index on a column. If a column contains just one NULL and all other values are unique, you can still create a unique constraint on the column (you cannot make it a primary key, though, which requires both uniquness of values and absence of NULLs).

Therefore you might need a more thorough analysis of the contents, which you could get with the following script:

DECLARE @table varchar(100), @sql varchar(max);
SET @table = 'some table name';

SELECT
  @sql = COALESCE(@sql + ', ', '') + ColumnExpression
FROM (
  SELECT
    ColumnExpression =
      'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
      'WHEN COUNT(*) THEN ''UNIQUE'' ' +
      'WHEN COUNT(*) - 1 THEN ' +
        'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
        'WHEN COUNT(' + COLUMN_NAME + ') THEN ''UNIQUE WITH SINGLE NULL'' ' +
        'ELSE '''' ' +
        'END ' +
      'WHEN COUNT(' + COLUMN_NAME + ') THEN ''UNIQUE with NULLs'' ' +
      'ELSE '''' ' +
      'END AS ' + COLUMN_NAME
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_NAME = @table
) s

SET @sql = 'SELECT ' + @sql + ' FROM ' + @table;
PRINT @sql;  /* in case you still want to have a look at the resulting query */
EXEC(@sql);

This solution takes NULLs into account by checking three values: COUNT(DISTINCT column), COUNT(column) and COUNT(*). It displays the results similarly to the former solution, but the possible diagnoses for the columns are more diverse:

  • UNIQUE means no duplicate values and no NULLs (can either be a PK or have a unique constraint/index);

  • UNIQUE WITH SINGLE NULL – as can be guessed, no duplicates, but there's one NULL (cannot be a PK, but can have a unique constraint/index);

  • UNIQUE with NULLs – no duplicates, two or more NULLs (in case you are on SQL Server 2008, you could have a conditional unique index for non-NULL values only);

  • empty string – there are duplicates, possibly NULLs too.

Solution 2

Here is I think probably the cleanest way. Just use dynamic sql and a single select statement to create a query that gives you a total row count and a count of distinct values for each field.

Fill in the DB name and tablename at the top. The DB name part is really important since OBJECT_NAME only works in the current database context.

use DatabaseName

DECLARE @Table varchar(100) = 'TableName'

DECLARE @SQL Varchar(max)

SET @SQL = 'SELECT COUNT(*) as ''Total'''

SELECT @SQL = @SQL + ',COUNT(DISTINCT ' + name + ') as ''' + name + ''''
FROM sys.columns c
WHERE OBJECT_NAME(object_id) = @Table

SET @SQL = @SQL + ' FROM ' + @Table

exec @sql

Solution 3

If you are using 2008, you can use the Data Profiling Task in SSIS to return Candidate Keys for each table.

This blog entry steps through the process, it's fairly simple:

http://consultingblogs.emc.com/jamiethomson/archive/2008/03/04/ssis-data-profiling-task-part-8-candidate-key.aspx

Solution 4

What about simple one line of code:

CREATE UNIQUE INDEX index_name ON table_name (column_name);

If the index is created then your column_name has only unique values. If there are dupes in your column_name, you will get an error message.

Solution 5

A few words what my code does:

  1. Read's all tables and columns

  2. Creates a temp table to hold table/columns with duplicate keys

  3. For each table/column it runs a query. If it finds a count(*)>1 for at least one value it makes an insert into the temp table

  4. Select's column and values from the system tables that do not match table/columns that are found to have duplicates

    DECLARE @sql VARCHAR(max)
    DECLARE @table VARCHAR(100)
    DECLARE @column VARCHAR(100)
    
    
    CREATE TABLE #temp (tname VARCHAR(100),cname VARCHAR(100))
    
    DECLARE mycursor CURSOR FOR
    select t.name,c.name
    from sys.tables t
    join sys.columns c on t.object_id = c.object_id
    where system_type_id not in (34,35,99)
    
    OPEN mycursor
    FETCH NEXT FROM mycursor INTO @table,@column
    
    WHILE @@FETCH_STATUS = 0
    BEGIN
    SET @sql = 'INSERT INTO #temp SELECT DISTINCT '''+@table+''','''+@column+ ''' FROM ' + @table + ' GROUP BY ' + @column +' HAVING COUNT(*)>1 '
    EXEC (@sql)
    FETCH NEXT FROM mycursor INTO @table,@column
    END
    
    select t.name,c.name
    from sys.tables t
    join sys.columns c on t.object_id = c.object_id
    left join #temp on t.name = #temp.tname and c.name = #temp.cname
    where system_type_id not in (34,35,99) and #temp.tname IS NULL
    
    DROP TABLE #temp
    
    CLOSE mycursor
    DEALLOCATE mycursor
    
Share:
19,303
zmaster
Author by

zmaster

Updated on June 05, 2022

Comments

  • zmaster
    zmaster almost 2 years

    I use MS SQL Server.

    Ive been handed some large tables with no constrains on them, no keys no nothing.

    I know some of the columns have unique values. Is there a smart way for a given table to finde the cols that have unique values ?

    Right now I do it manually for each column by counting if there is as many DISTINCT values as there are rows in the table.

    SELECT COUNT(DISTINCT col) FROM table
    

    Could prob make a cusor to loop over all the columns but want to hear if someone knows a smarter or build-in function.

    Thanks.

  • Andriy M
    Andriy M almost 13 years
    This will be very helpful assuming there are no NULLs in the column(s). NULLs are taken into account in SQL Server when creating unique constraints/indexes.
  • zmaster
    zmaster almost 13 years
    thanks for your answer, I have not tried it out as it runs over all tables which is not what I need for this specific question.
  • niktrs
    niktrs almost 13 years
    you can place an additional clause in the cursor query definition: AND tname = your_table_name
  • ratneshsinghparihar
    ratneshsinghparihar over 11 years
    giving error "Warning: Null value is eliminated by an aggregate or other SET operation. " . Can not be use from application.
  • Andriy M
    Andriy M over 11 years
    @user998660: "From application particularly sensitive to warnings", you mean? In that case, no, it can't, of course. Not all applications are that sensitive, though. For instance, I don't remember ever having issues in Delphi with aggregates that produce such warnings. Anyway, my impression was that the OP wanted a solution they could use "manually", e.g., in case of a query, invoking it in a tool like SSMS.