Numeric hashing function in SQL Server?

13,038

SQL Server has several built-in functions for calculating various hashes.

It is not clear what you mean by "the data I am dealing with at the moment has to be numeric." The hash can be calculated off any kind of source data, the result of the hash function (the number of bits) depends on the chosen hash function.

Technically you can have your key defined as binary(n) with whatever number of bytes you are comfortable with. 4 and 8 bytes (int and bigint) are just special cases.


Here is a list of SQL Server hash functions that I know about.

  1. BINARY_CHECKSUM, returns int.

Returns the binary checksum value computed over a row of a table or over a list of expressions.

It may be the simplest function to use for you, since you can easily specify what columns to include in calculations:

SELECT BINARY_CHECKSUM(Col1, Col2, Col3) FROM MyTable;

The drawbacks of this function are: It returns int, which may result in high chance of collisions. We don't really know what algorithm it implements and this algorithm may be different in different versions of SQL Server. If your remote system needs to calculate the hash as well, then you have to use some well-known standard function, see HASHBYTES below.

  1. CHECKSUM, very similar to BINARY_CHECKSUM. The main difference that I saw in the docs is that CHECKSUM obeys the collation rules, such as case-sensitivity, while BINARY_CHECKSUM always uses binary values of the columns.

For example, the strings "McCavity" and "Mccavity" have different BINARY_CHECKSUM values. In contrast, for a case-insensitive server, CHECKSUM returns the same checksum values for those strings. You should avoid comparison of CHECKSUM values with BINARY_CHECKSUM values.

  1. HASHBYTES. Implements the given hashing algorithm (MD2 | MD4 | MD5 | SHA | SHA1 | SHA2_256 | SHA2_512). Returns varbinary.
SELECT 
    HASHBYTES('SHA2_512', 
        CAST(Col1 AS varbinary(8000)) + 
        CAST(Col2 AS varbinary(8000)) + 
        CAST(Col3 AS varbinary(8000))) 
FROM MyTable;
Share:
13,038
Lock
Author by

Lock

Updated on June 17, 2022

Comments

  • Lock
    Lock almost 2 years

    Is there such thing as a hashing function that produces numbers as its output?

    Basically- I need to create a key column in my SQL Server that is deterministic (the result is repeatable) and is based off 3 columns in the database. This column will be used as key for that piece of data that will go into a remote system (and I will use this key to match the data back up when it is created in the foreign system).

    For similar things, I have been using an SHA5 hashing algorithm to create my keys, however the data I am dealing with at the moment has to be numeric.

    Any ideas? The result has to be repeatable and as such has to be based off the input columns.