Binary hashing - what is it?

16,247

Solution 1

Look at this page which describes CRC32 -good old Wikipedia

This is possibly the simplest hashing algorithm (certainly not the best!), but it should give you a general idea of how a hash works.

All the other hash algorithms do basically the same thing, but with algorithms that are either harder to reverse (sha256 etc.) or which give a more even distribution of results and less likelihood of a collision (perlhash etc).

Which is best depends on what you want the hash for:

  • Proving a file was not tampered with --> sha256/512.
  • Storing a password or other value you want to keep secret --> sha256/512
  • Getting a numeric key for an array or database record from a string --> perlhash or similar.
  • Quickly obfuscating or masking a account numbers --> crc32

Here is an excellent article describing the hash function used by the perl programming language bob burtle's hash

Solution 2

There are many binary hashing algorithms "md5", "sha256", "sha512", "haval160", etc..

Here is a description of the MD5 algorithm. This pseudo-code and a complete c implementation of it can be found at http://en.wikipedia.org/wiki/MD5. At a glance it seems that A, B, C, and D are used to create F and g in this procedure. Before this procedure the input is broken up into chunks of 512-bit blocks. Then, further into sixteen 32-bit words.

The MD5 hash is calculated according to this algorithm. All values are in little-endian.

//Note: All variables are unsigned 32 bit and wrap modulo 2^32 when calculating
var int[64] s, K

//s specifies the per-round shift amounts
s[ 0..15] := { 7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22}
s[16..31] := { 5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20}
s[32..47] := { 4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23}
s[48..63] := { 6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21}

//Use binary integer part of the sines of integers (Radians) as constants:
for i from 0 to 63
    K[i] := floor(abs(sin(i + 1)) × (2 pow 32))
end for
//(Or just use the following table):
K[ 0.. 3] := { 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee }
K[ 4.. 7] := { 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501 }
K[ 8..11] := { 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be }
K[12..15] := { 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821 }
K[16..19] := { 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa }
K[20..23] := { 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8 }
K[24..27] := { 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed }
K[28..31] := { 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a }
K[32..35] := { 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c }
K[36..39] := { 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70 }
K[40..43] := { 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05 }
K[44..47] := { 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665 }
K[48..51] := { 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039 }
K[52..55] := { 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1 }
K[56..59] := { 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1 }
K[60..63] := { 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391 }

//Initialize variables:
var int a0 := 0x67452301   //A
var int b0 := 0xefcdab89   //B
var int c0 := 0x98badcfe   //C
var int d0 := 0x10325476   //D

//Pre-processing: adding a single 1 bit
append "1" bit to message    
/* Notice: the input bytes are considered as bits strings,
  where the first bit is the most significant bit of the byte.[41]


//Pre-processing: padding with zeros
append "0" bit until message length in bit ≡ 448 (mod 512)
append length mod (2 pow 64) to message


//Process the message in successive 512-bit chunks:
for each 512-bit chunk of message
    break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15
//Initialize hash value for this chunk:
    var int A := a0
    var int B := b0
    var int C := c0
    var int D := d0
//Main loop:
    for i from 0 to 63
        if 0 ≤ i ≤ 15 then
            F := (B and C) or ((not B) and D)
            g := i
        else if 16 ≤ i ≤ 31
            F := (D and B) or ((not D) and C)
            g := (5×i + 1) mod 16
        else if 32 ≤ i ≤ 47
            F := B xor C xor D
            g := (3×i + 5) mod 16
        else if 48 ≤ i ≤ 63
            F := C xor (B or (not D))
            g := (7×i) mod 16
        dTemp := D
        D := C
        C := B
        B := B + leftrotate((A + F + K[i] + M[g]), s[i])
        A := dTemp
    end for
//Add this chunk's hash to result so far:
    a0 := a0 + A
    b0 := b0 + B
    c0 := c0 + C
    d0 := d0 + D
end for

var char digest[16] := a0 append b0 append c0 append d0 //(Output is in little-endian)

//leftrotate function definition
leftrotate (x, c)
    return (x << c) binary or (x >> (32-c));

Solution 3

You are correct. Wikipedia's picture pretty much describes it: https://en.wikipedia.org/wiki/Merkle_tree

It depends on the implementation how you split your original message. Obviously, if your message is relatively small, it is useless to split it into millions of blocks, likewise, if your message is very big, it's awkward to split it in blocks of a byte each.

Do not forget that you do need to communicate your splitting to all those using it. otherwise the hashes don't match

Share:
16,247
rusty009
Author by

rusty009

Updated on July 20, 2022

Comments

  • rusty009
    rusty009 almost 2 years

    I am trying to understand what Binary hashing is. My understanding, is that you split your message into four parts, D1-D4, you then has each of those parts individually and get H1-H4. You then hash H1+H2 and H3+H4 to create H5 and H6. You then hash H5 and H6 to generate your final hash value, H. Is this correct? If not please tell me where I'm going wrong, thank you!

  • Gustavo Rodrigues
    Gustavo Rodrigues over 9 years
    Note: don't store passwords by those ways, it can be tolerable but isn't the best way.