How do I calculate the equivalent to SQL Server (hashbytes('SHA1',[ColumnName])) in C#?

15,538

Solution 1

You're likely getting bitten by character encoding differences:

http://weblogs.sqlteam.com/mladenp/archive/2009/04/28/Comparing-SQL-Server-HASHBYTES-function-and-.Net-hashing.aspx

You could try getting the bytes via Encoding.ASCII.GetBytes(url) or Encoding.Unicode.GetBytes(url) and see which one your db is using.

Solution 2

Below are two methods that do hashing of string and of bytes. The HashBytes method returns Base64 of the resulting bytes but you can return just the bytes if you prefer them

public static string HashString(string cleartext)
{
    byte[] clearBytes = Encoding.UTF8.GetBytes(cleartext);
    return HashBytes(clearBytes);
}  

public static string HashBytes(byte[] clearBytes)
{
    SHA1 hasher = SHA1.Create();
    byte[] hashBytes =   hasher.ComputeHash(clearBytes);
    string hash = System.Convert.ToBase64String(hashBytes);
    hasher.Clear();
    return hash;
}

Solution 3

The below code is equivalent to SQL Server's hashbytes('sha1')

using (SHA1Managed sha1 = new SHA1Managed()) {
    var hash = sha1.ComputeHash(Encoding.Unicode.GetBytes(input));
    var sb = new StringBuilder(hash.Length * 2);

    foreach (byte b in hash) {
        // can be "x2" if you want lowercase
        sb.Append(b.ToString("X2"));
    }

    string output = sb.ToString();
}
Share:
15,538
Gilad Gat
Author by

Gilad Gat

Updated on July 11, 2022

Comments

  • Gilad Gat
    Gilad Gat almost 2 years

    In my database I have a computed column that contains a SHA1 hash of a column called URLString which holds URLs (e.g. "http://xxxx.com/index.html").

    I often need to query the table to find a specific URL based on the URLString column. The table contains 100K's and these queries take several seconds (using SQL Azure). Since URLs can be quite long, I cannot create an index on this column (above 450 bytes).

    To speed things up I want to calculate the equivalent of SQL Server hashbytes('SHA1',[URLString]) from C# and query based on this value.

    I tried the below code, but the value I get is different than the one calculated by the database.

    var urlString = Encoding.ASCII.GetBytes(url.URLString); //UTF8 also fails
    var sha1 = new SHA1CryptoServiceProvider();
    byte[] hash = sha1.ComputeHash(urlString);
    

    Am I missing something trivial here?
    I'm open to other ideas that can solve the same problem (as long as they are supported by SQL Azure).

    Example: in the database the automatically calculated SHA1 value of URL http://www.whatismyip.org/ is 0xAE66CA69A157186A511ED462153D7CA65F0C1BF7.

  • RobIII
    RobIII over 11 years
    I would avoid "cleartext" and "clearbytes" terminology as this implies there's an opposite involved (e.g. "encrypted bytes" / "encrypted string"). There's no encryption going on here; only plain ol' hashing. And we all know encryption != hashing
  • Sten Petrov
    Sten Petrov over 11 years
    this was extracted from production code and there the 'clearbytes' makes sense. It's also not a contradiction because the output is not named 'encryptedText' or 'encryptedBytes' and the input is the same 'clear text' and describe the contents appropriately
  • Gilad Gat
    Gilad Gat over 11 years
    This is essentially the same solution I posted, except with SHA1.Create() instead of my new SHA1CryptoServiceProvider(). I tried it and got the same result as I did before (still different that the database)
  • JerKimball
    JerKimball over 11 years
    No problem - yeah, hashing algorithms are VERY sensitive to encoding.
  • xanatos
    xanatos about 9 years
    In general you can use Select TABLE_NAME, COLUMN_NAME, Columns.COLLATION_NAME From INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'Yourtable' to check the collation that is used for a table/column. From the collation, if it is _CP1_ (like SQL_Latin1_General_Cp1_CS_AS) then it is Windows-1252, if it is _CP###_ then the ### is the codepage number (like SQL_Latin1_General_Cp437_CS_AS that is 437), otherwise you have to search :-)
  • mr R
    mr R over 5 years
    varchar = UTF8 / (extra accents etc) nvarchar = UNICODE