Generate a hash for a set of rows in sql server

27,760

Solution 1

You can use the CHECKSUM_AGG aggregate. it is made for that purpose.

Solution 2

select HashBytes('md5',convert(varbinary(max),(SELECT * FROM MyChildTable WHERE ParentId = 2 FOR XML AUTO)))

but HashBytes is limited to 8000 bytes only... you can make a function to get de Md5 for every 8000 bytes....

Solution 3

For single row hashes:

select HASHBYTES('md5', Name + Description + AnotherColumn)
FROM MyChildTable WHERE ParentId = 2

for table checksum:

select sum(checksum(Name + Description + AnotherColumn)*1.0)
FROM MyChildTable WHERE ParentId = 2

Solution 4

Another approach:

-- compute a single hash value for all rows of a table
begin

    set nocount on;

    -- init hash variable
    declare @tblhash varchar(40);
    set @tblhash = 'start';

    -- compute a single hash value
    select @tblhash = sys.fn_varbintohexsubstring(0, hashbytes('sha1',(convert(varbinary(max),@tblhash+
    (select sys.fn_varbintohexsubstring(0,hashbytes('sha1',(convert(varbinary(max),
    -- replace 'select *' if you want only specific columns to be included in the hash calculation
    -- [target table] is the name of the table to calc the hash from
    -- [row_id] is the primary key column within the target table
    -- modify those in the next lines to suit your needs:
    (select * from [target_table] obj2 where obj2.[row_id]=obj1.[row_id] for xml raw)
    ))),1,0))
    ))),1,0)
    from [target_table] obj1;

    set nocount off;

    -- return result
    select @tblhash as hashvalue;

end;
Share:
27,760
krisdyson
Author by

krisdyson

Passionate about .NET, .NET Core, Azure and the Web.

Updated on July 21, 2022

Comments

  • krisdyson
    krisdyson almost 2 years

    Is there any way in SQL Server 2012 to generate a hash of a set of rows and columns?

    I want to generate a hash, store it on the parent record. The when an update comes in, I'll compare the incoming hash with the parent record hash and I'll know whether the data has changed.

    So something like this would be nice:

    SELECT GENERATEHASH(CONCATENATE(Name, Description, AnotherColumn))
    FROM MyChildTable WHERE ParentId = 2 -- subset of data belong to parent record 2
    

    "CONCATENATE" would be an aggregate function which would not only concat the columns, but also, the rows inside the resultset. Like MAX, but returning everything as a string concatenation.

    Hopefully this helps you see what I mean anyway!

    The fundamental problem I'm trying to solve is that my client's system perform imports of vast amounts of hierarchical data. If I can avoid processing through the use of hashes, then I would think this will save a lot of time. At the moment, the SP is running 300% slower when having to process duplicate data.

    Many thanks