How to find which columns don't have any data (all values are NULL)?

26,015

Solution 1

For a single column, count(ColumnName) returns the number of rows where ColumName is not null:

select  count(TheColumn)
from    YourTable

You can generate a query for all columns. Per Martin's suggestion, you can exclude columns that cannot be null with is_nullable = 1. For example:

select  'count(' + name + ') as ' + name + ', '
from    sys.columns
where   object_id = object_id('YourTable')
        and is_nullable = 1

If the number of tables is large, you can generate a query for all tables in a similiar way. The list of all tables is in sys.tables.

Solution 2

Here's a script I wrote to do the same thing, it's a two-step manual process:

  1. Run this script in SSMS and select all of the rows in the Results pane:
SELECT
'SELECT
    COUNT( DISTINCT [' + COLUMN_NAME + ']) AS UniqueValues,
    ''' + TABLE_NAME + '.' + COLUMN_NAME + ''' AS ColumnName
FROM
    [' + TABLE_SCHEMA + '].[' + TABLE_NAME + ']

UNION ALL
'
FROM
    INFORMATION_SCHEMA.COLUMNS
ORDER BY
    TABLE_NAME,
    COLUMN_NAME
  1. Paste the results into a new query window. Scroll to the very bottom and remove the trailing UNION ALL statement. It will look like this:
SELECT   COUNT( DISTINCT [ModifiedByUserId]) AS UniqueValues,   'Inspections.ModifiedByUserId' AS ColumnName  FROM   [dbo].[Inspections]     UNION ALL  
SELECT   COUNT( DISTINCT [Notes]) AS UniqueValues,   'Inspections.Notes' AS ColumnName  FROM   [dbo].[Inspections]     UNION ALL  
SELECT   COUNT( DISTINCT [PublicPassword]) AS UniqueValues,   'Inspections.PublicPassword' AS ColumnName  FROM   [dbo].[Inspections]     UNION ALL  
SELECT   COUNT( DISTINCT [ShopId]) AS UniqueValues,   'Inspections.ShopId' AS ColumnName  FROM   [dbo].[Inspections]     UNION ALL  
SELECT   COUNT( DISTINCT [Status]) AS UniqueValues,   'Inspections.Status' AS ColumnName  FROM   [dbo].[Inspections]     UNION ALL  
SELECT   COUNT( DISTINCT [SupervisorUserId]) AS UniqueValues,   'Inspections.SupervisorUserId' AS ColumnName  FROM   [dbo].[Inspections]     UNION ALL  
  1. Run the query. It took about 6 minutes to run on a 300-column database. It will be faster or slow depending on how many indexes are being used.

Solution 3

Updated....Okay I had way too much fun with this

THe Proc accepts two parameters, the table to search & the Criteria to apply. you can pass essentially and where clause to the second parameter. I wrote the proc to interpret double quotes back to single quotes....again this was built off of the original developers concepts.

GO
/****** Object:  StoredProcedure [dbo].[SearchAllTables]    Script Date: 05/04/2011 14:29:48 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
alter PROC [dbo].[SearchAllTables2] ( @SEARCH_TABLE NVARCHAR(255), @CONDITION AS NVARCHAR(MAX) ) AS
BEGIN

-- Copyright © 2002 Narayana Vyas Kondreddi. All rights reserved.
-- Purpose: To search all columns of all tables for a given search string
-- Written by: Narayana Vyas Kondreddi -- Site: http://vyaskn.tripod.com
-- Tested on: SQL Server 7.0 and SQL Server 2000
-- Date modified: 28th July 2002 22:50 GMT
CREATE TABLE #Results (ColumnName nvarchar(370), ColumnValue nvarchar(3630))
SET NOCOUNT ON
DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128), @SearchStr2 nvarchar(110), @COND_STR NVARCHAR(MAX)
SET @TableName = '' 
--SET @SearchStr2 = QUOTENAME('%' + @SearchStr + '%','''')
WHILE @TableName IS NOT NULL
BEGIN
SET @ColumnName = ''
SET @CONDITION = REPLACE(@CONDITION,'"','''')
SET @TableName =
(
SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE
TABLE_TYPE = 'BASE TABLE'
AND
QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME) > @TableName
AND OBJECTPROPERTY(
OBJECT_ID(
QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME)
), 'IsMSShipped'
) = 0 AND TABLE_NAME = @SEARCH_TABLE
) WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL) 
BEGIN SET @ColumnName = ( 
SELECT MIN(QUOTENAME(COLUMN_NAME)) 
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2) AND 
TABLE_NAME = PARSENAME(@TableName, 1) AND QUOTENAME(COLUMN_NAME) > @ColumnName ) 
IF @ColumnName IS NOT NULL 
BEGIN 
SET @COND_STR = REPLACE(@CONDITION,'''','"')
INSERT INTO #Results 
EXEC ( 'SELECT  DISTINCT ''' + @TableName + '.' + @ColumnName + ''',''' + @COND_STR + ''' AS CONDITION FROM ' + @TableName + ' (NOLOCK) ' + ' WHERE ' + @ColumnName + ' ' + @CONDITION)
PRINT ( 'SELECT  DISTINCT ''' + @TableName + '.' + @ColumnName + ''',''' + @COND_STR + ''' AS CONDITION FROM ' + @TableName + ' (NOLOCK) ' + ' WHERE ' + @ColumnName + ' ' + @CONDITION)
END 
END 
END 
    SELECT ColumnName, ColumnValue 
    FROM #Results 
END
GO
-- to execute

exec [SearchAllTables2] 'TABLENAME','LIKE "%DOUG%"' -- double quotes are automatically escaped to single quotes...

Original code modified from copyright below....only using portions.

GO
/****** Object:  StoredProcedure [dbo].[SearchAllTables]    Script Date: 05/04/2011 14:29:48 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
alter PROC [dbo].[SearchAllTables2] ( @TABLE_NAME NVARCHAR(255) ) AS
BEGIN

-- Copyright © 2002 Narayana Vyas Kondreddi. All rights reserved.
-- Purpose: To search all columns of all tables for a given search string
-- Written by: Narayana Vyas Kondreddi -- Site: http://vyaskn.tripod.com
-- Tested on: SQL Server 7.0 and SQL Server 2000
-- Date modified: 28th July 2002 22:50 GMT
CREATE TABLE #Results (ColumnName nvarchar(370), ColumnValue nvarchar(3630))
SET NOCOUNT ON
DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128), @SearchStr2 nvarchar(110)
SET @TableName = '' 
--SET @SearchStr2 = QUOTENAME('%' + @SearchStr + '%','''')
WHILE @TableName IS NOT NULL
BEGIN
SET @ColumnName = ''
SET @TableName =
(
SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE
TABLE_TYPE = 'BASE TABLE'
AND
QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME) > @TableName
AND OBJECTPROPERTY(
OBJECT_ID(
QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME)
), 'IsMSShipped'
) = 0 AND TABLE_NAME = @TABLE_NAME
) WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL) 
BEGIN SET @ColumnName = ( 
SELECT MIN(QUOTENAME(COLUMN_NAME)) 
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2) AND 
TABLE_NAME = PARSENAME(@TableName, 1) AND QUOTENAME(COLUMN_NAME) > @ColumnName ) 
IF @ColumnName IS NOT NULL 
BEGIN 
INSERT INTO #Results 
EXEC ( 'SELECT  DISTINCT ''' + @TableName + '.' + @ColumnName + ''', ''IS NULL'' FROM ' + @TableName + ' (NOLOCK) ' + ' WHERE ' + @ColumnName + ' IS NULL ')--LIKE ' + @SearchStr2 ) 
--PRINT ( 'SELECT  DISTINCT ''' + @TableName + '.' + @ColumnName + ''', ''IS NOT NULL'' FROM ' + @TableName + ' (NOLOCK) ' + ' WHERE  ' + @ColumnName + ' IS NOT NULL ')--LIKE ' + @SearchStr2 ) 
END 
END 
END 
    SELECT ColumnName, ColumnValue 
    FROM #Results 
END
GO
-- to execute

exec [SearchAllTables2] 'Master'

Solution 4

This is way too useful not to be put into a handy little system proc - especially if you've just inherited a legacy database, and are wondering what columns you can drop or ignore.

/*
Show the count of not-null values in a table
*/
create proc sp_aaShowAllNullColumns @tableName varchar(255) as
begin
set nocount on

declare @sql nvarchar(4000)
declare @cols nvarchar(4000)
declare @tcols table( colbit nvarchar(255) )

insert @tcols
select 'count(' + name + ') as ' + name + ', ' as colbit
from    sys.columns
where   object_id = object_id(@tableName)
and is_nullable = 1 

select @cols = coalesce( @cols, ', ', '' ) + colbit from @tcols
select @cols = substring( @cols, 1, (len(@cols) - 1) )
select @cols = isnull( @cols, '' )

select @sql = 'select count(*) as Rows' + @cols + ' from ' + @tableName
exec sp_executeSql @sql

end
go

exec sys.sp_MS_marksystemobject 'sp_aaShowAllNullColumns'
go

use Bookshop
go
exec sp_aaShowAllNullColumns 'Books'
go

Solution 5

Here's a bash script that, for all non-empty tables in an SQLite database (or for all the specified tables in such a database), identifies the all-NULL columns. The same technique can be used in the programming language of your choice assuming it can talk to an SQLite database.

#!/bin/bash

function help {
    cat <<EOF
Syntax: $0 databasefile [table ...]

If no tables are specified, then for each non-empty user table in the
specified SQLite database row, this script will emit the column names
of those columns in which all the values are NULL.  If any tables are
specified, only the specified tables are scanned.

The script is written to make it easy to modify the criteria and the output.

Thanks to SQL, two passes are required per table, and if no tables are
specified, an additional invocation of sqlite3 is required.

Column names are written in the form: tablename.columnname

Requirements:
  sqlite3 on the \$PATH

Options: 
 -v | --verbose :: emit additional information

EOF
}

while [ "$1" ]
do case "$1" in
      -h | --help | "" ) help
           exit
           ;;
      -v | --verbose ) VERBOSE=1
           shift
           ;;
      * ) break
      ;;
  esac
done

function verbose { if [ "$VERBOSE" ] ; then echo "$@" >&2 ; fi ; }

db="$1"
shift

if [ ! -s "$db" ] ; then echo "$0 : $db not found" ; exit ; fi

# To prevent loading ~/.sqliterc specify -init ""

# Global: db
function nullcolumns {
  local table="$1"
  local count column field nulls
  ( read count
    if [ -n "$count" ] ; then
        verbose "Row count for $table: $count"
    if [ "$count" -gt 0 ] ; then
          while read column ; do
          echo "SELECT '$column', * FROM
                  (SELECT COUNT(*) FROM $table WHERE '$column' IS NULL);"
          done |
          sqlite3 -readonly "$db" | while IFS="|" read field nulls ; do
            verbose $table.$field ... $nulls
            if [ "$nulls" -eq $count ] ; then echo "$table.$field" ; fi
          done
        else cat > /dev/null
        fi
    else cat > /dev/null
    fi ) < <(sqlite3 -readonly "$db" "select count(*) from '$table'; 
                                      select name from pragma_table_info( '$table' )")
}

if [ $# = 0 ] ; then
    sqlite3 -readonly "$db" .tables | while read table ; do
    nullcolumns "$table"
    done
else
    for table ; do
    nullcolumns "$table"
    done
fi
Share:
26,015
jrara
Author by

jrara

Updated on November 07, 2021

Comments

  • jrara
    jrara over 2 years

    I have several tables in a database. I would like to find which columns (in which tables) don't have any values (all NULL in a column). I the example below, the result should be

    TestTable1 --> Var2
    TestTable2 --> Variable1
    

    I don't have any idea how to create this kind of query. Your help is most appreciated!

    --create first table
    create table dbo.TestTable1 (
    sur_id int identity(1,1) not null primary key,
    var1 int null,
    var2 int null
    )
    go
    
    --insert some values
    insert into dbo.TestTable1 (var1) 
        select 1 union all select 2 union all select 3
    
    --create second table
    create table dbo.TestTable2 (
    sur_id int identity(1,1) not null primary key,
    variable1 int null,
    variable2 int null
    )
    
    --and insert some values
    insert into dbo.TestTable2 (variable2) 
        select 1 union all select 2 union all select 3
    
  • Martin Smith
    Martin Smith about 13 years
    Why not COUNT(col)? That only counts NOT NULL values.
  • Andomar
    Andomar about 13 years
    @Martin: Good point, I guess I'm more comfortable with an explicit case. Edited in answer.
  • cmartin
    cmartin almost 8 years
    Cool - I did have one error happen if the table had 'text' as the column datatype and got around that by converting it. Case When system_type_id = 35 THEN 'count(CAST(' + name + ' as VARCHAR(MAX))) as ' + name + ', ' ELSE 'count(' + name + ') as ' + name + ', ' END
  • adolf garlic
    adolf garlic over 7 years
    if you hit the error "The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information." Simply create a table with two columns and change the dynamic sql to include INSERT INTO TEMPYTABLE (COL1,COL2) at the beginning and remove the UNION ALL at the end
  • StuartN
    StuartN over 7 years
    If you only want to know about all-null columns, you can remove "distinct" and it will run very much faster.
  • Philip Rego
    Philip Rego almost 7 years
    Good job. This is the only answer that works, and you don't need to install anything. 2 step process but saved me a load of time. It ran overnight on a like 5 million rows 1k columns.
  • Admin
    Admin over 2 years
    As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
  • Gert Arnold
    Gert Arnold over 2 years
    Please don't post code-only answers. Future readers will be grateful to see explained why it answers the question instead of having to infer it from the code. Also, can yo explain how this complements the other answers?