What is the fastest hash function for pointers?

20,257

Solution 1

After letting this question lay for a while, I'll post my best hash function for pointers so far:

template<typename Tval>
struct MyTemplatePointerHash1 {
    size_t operator()(const Tval* val) const {
        static const size_t shift = (size_t)log2(1 + sizeof(Tval));
        return (size_t)(val) >> shift;
    }
};

It's high performing for various block sizes.
If someone has a better function, I'll change the accepted answer.

Solution 2

The correct answer from a theoretical point of view is: "Use std::hash which is likely specialized to be as good as it gets, and if that is not applicable, use a good hash function rather than a fast one. The speed of the hash function does not matter so much as its quality".

The practical answer is: "Use std::hash, which is piss-poor, but nevertheless performs surprisingly well."

TL;DR
After having become intrigued, I ran about 30 hours of benchmarks over the weekend. Among other things, I tried to get an average case vs. worst case and tried to coerce std::unordered_map into worst-case behavior by giving it deliberately bad hints on bucket counts in respect of the set size inserted.

I compared poor hashes (std::hash<T*>) to well-known general purpose hashes of overall good quality (djb2, sdbm) as well as variations of these that account for the very short input length, and to hashes which are explicitly thought to be used in hashtables (murmur2 and murmur3), and piss-poor hashes that are actually worse than not hashing at all since they throw away entropy.
Since the lowest 2-3 bits are always zero on pointers due to alignment, I deemed it worthwhile to test a simple right-shift as "hash", so only the non-zero information would be used, in case the hashtable e.g. used only the lowermost N bits. Turned out that for reasonable shifts (I also tried unreasonable shifts!) this actually performs quite well.

Findings

Some of my findings were well-known and unsurprising, others are very surprising:

  • It is hard to predict what is a "good" hash. Writing good hash functions is hard. Not surprising, well-known fact, and once again proven.
  • No single hash significantly outperforms all others in every scenario. No single hash even significantly outperforms all others 80% of the time. The first result was expected, the second is nevertheless surprising.
  • It is really hard to press std::unordered_map into behaving badly. Even when given deliberately bad hints to bucket counts which will force several re-hashes, the overall performance is not much worse. Only the very worst hash functions that throw away most of the entropy in an almost ridiculous way are able to significantly impact performance by more than 10-20% (such as right_shift_12, which practically results in only 12 distinct hash values for 50,000 inputs! It is not surprising that the hash map runs around 100 times slower in this case -- we're basically doing random access lookup on a linked list.).
  • Some "funny" results are surely due to implementation details. My implementation (GCC) uses a slighly-larger-than-2^N prime bucket count, and inserts values with indentical hashes head-first into linked lists.
  • The specialization of std::hash<T*> is outright pathetic for GCC (a simple reinterpret_cast). Funnily, a functor which does the identical thing consistently performs faster at insertions and slower at random access. The difference is small (a dozen milliseconds on a 8-10 second test run), but it is not noise, it's consistenly present -- probably related to instruction reordering or pipelining. It's stunning how the exact same code (which is also a no-op) consistenly performs differently in two different scenarios.
  • Pathetic hashes do not perform significantly worse than "good" hashes or hashes explicitly designed for hash tables. Indeed, half of the time, they are the best performers, or among the top 3.
  • The "best" hash functions rarely if ever result in the best overall performance.
  • The hashes posted as answers in this SO question are generally OK. They are good average, but not superior to std::hash. Usually they'll land in the top 3-4.
  • Poor hashes are somewhat vulnerable to the order of insertion (performing worse on random insertion and random lookup following random insertion) whereas "good" hashes are more resilient to the impact of the order of insertion (little or no difference), but overall performance is still slightly slower.

Test Setup

Test were done not just any 4-byte or 8-byte (or whatever) aligned values, but for actual addresses obtained from allocating the complete set of elements on the heap, and storing the addresses as provided by the allocator in a std::vector (the objects were then deleted, they're not needed).
Addresses were inserted into a newly allocated std::unordered_map for every test in the order stored in the vector, once in the original order ("sequential") and once after applying a std::random_shuffle on the vector.

Tests were performed for sets of 50,000 and 1,000,000 objects of size 4, 16, 64, 256, and 1024 (results for 64 omitted here for brevity, they're as you'd expect somewhere in the middle between 16 and 256 -- StackOverflow only allows 30k characters being posted).
Test suite was performed 3 times, results varying by 3 or 4 milliseconds here and there, but overall identical. The results posted here are the last run.

The order of insertions in the "random" test as well as the access pattern (in every test) is pseudorandom, but exactly identical for every hash function in a test run.

The timings under hash benchmarks are for summing up 4,000,000,000 hash values in an integer variable.

The column insert is the time in milliseconds for 50 iterations of creating a std::unordered_map, inserting 50,000 and 1,000,000 elements respectively, and destroying the map.

The column access is the time in milliseconds to do 100,000,000 lookups of a pseudorandom element in the 'vector' followed by looking up that address in the unordered_map.
This timing includes on the average one cache misse for accessing a random element in the vector, at least for the large dataset (small dataset fits completely into L2).

All timings on a 2.66GHz Intel Core2, Windows 7, gcc 4.8.1/MinGW-w64_32. Timer granularity @ 1ms.

Source Code

Source code is available on Ideone, again because of Stackoverflow's 30k character limit.

Note: Running the complete test suite takes well over 2 hours on a desktop PC, so be prepared to take a walk if you want to reproduce the results.

Test Results

Benchmarking hash funcs...
std::hash                 2576      
reinterpret_cast          2561      
djb2                      13970     
djb2_mod                  13969     
sdbm                      10246     
yet_another_lc            13966     
murmur2                   11373     
murmur3                   15129     
simple_xorshift           7829      
double_xorshift           13567     
right_shift_2             5806      
right_shift_3             5866      
right_shift_4             5705      
right_shift_5             5691      
right_shift_8             5795      
right_shift_12            5728      
MyTemplatePointerHash1    5652      
BasileStarynkevitch       4315      

--------------------------------
sizeof(T)       = 4
sizeof(T*)      = 4
insertion order = sequential

dataset size    =    50000 elements
name                      insert     access    
std::hash                 421        6988       
reinterpret_cast          408        7083       
djb2                      451        8875       
djb2_mod                  450        8815       
sdbm                      455        8673       
yet_another_lc            443        8292       
murmur2                   478        9006       
murmur3                   490        9213       
simple_xorshift           460        8591       
double_xorshift           477        8839       
right_shift_2             416        7144       
right_shift_3             422        7145       
right_shift_4             414        6811       
right_shift_5             425        8006       
right_shift_8             540        11787      
right_shift_12            1501       49604      
MyTemplatePointerHash1    410        7138       
BasileStarynkevitch       445        8014       

--------------------------------
sizeof(T)       = 4
sizeof(T*)      = 4
insertion order = random

dataset size    =    50000 elements
name                      insert     access    
std::hash                 443        7570       
reinterpret_cast          436        7658       
djb2                      473        8791       
djb2_mod                  472        8766       
sdbm                      472        8817       
yet_another_lc            458        8419       
murmur2                   479        9005       
murmur3                   491        9205       
simple_xorshift           464        8591       
double_xorshift           476        8821       
right_shift_2             441        7724       
right_shift_3             440        7716       
right_shift_4             450        8061       
right_shift_5             463        8653       
right_shift_8             649        16320      
right_shift_12            3052       114185     
MyTemplatePointerHash1    438        7718       
BasileStarynkevitch       453        8140       

--------------------------------
sizeof(T)       = 4
sizeof(T*)      = 4
insertion order = sequential

dataset size    =  1000000 elements
name                      insert     access    
std::hash                 8945       32801      
reinterpret_cast          8796       33251      
djb2                      11139      54855      
djb2_mod                  11041      54831      
sdbm                      11459      36849      
yet_another_lc            14258      57350      
murmur2                   16300      39024      
murmur3                   16572      39221      
simple_xorshift           14930      38509      
double_xorshift           16192      38762      
right_shift_2             8843       33325      
right_shift_3             8791       32979      
right_shift_4             8818       32510      
right_shift_5             8775       30436      
right_shift_8             10505      35960      
right_shift_12            30481      91350      
MyTemplatePointerHash1    8800       33287      
BasileStarynkevitch       12885      37829      

--------------------------------
sizeof(T)       = 4
sizeof(T*)      = 4
insertion order = random

dataset size    =  1000000 elements
name                      insert     access    
std::hash                 12183      33424      
reinterpret_cast          12125      34000      
djb2                      22693      51255      
djb2_mod                  22722      51266      
sdbm                      15160      37221      
yet_another_lc            24125      51850      
murmur2                   16273      39020      
murmur3                   16587      39270      
simple_xorshift           16031      38628      
double_xorshift           16233      38757      
right_shift_2             11181      33896      
right_shift_3             10785      33660      
right_shift_4             10615      33204      
right_shift_5             10357      38216      
right_shift_8             15445      100348     
right_shift_12            73773      1044919    
MyTemplatePointerHash1    11091      33883      
BasileStarynkevitch       15701      38092      

--------------------------------
sizeof(T)       = 64
sizeof(T*)      = 4
insertion order = sequential

dataset size    =    50000 elements
name                      insert     access    
std::hash                 415        8243       
reinterpret_cast          422        8321       
djb2                      445        8730       
djb2_mod                  449        8696       
sdbm                      460        9439       
yet_another_lc            455        9003       
murmur2                   475        9109       
murmur3                   482        9313       
simple_xorshift           463        8694       
double_xorshift           465        8900       
right_shift_2             416        8402       
right_shift_3             418        8405       
right_shift_4             423        8366       
right_shift_5             421        8347       
right_shift_8             453        9195       
right_shift_12            666        18008      
MyTemplatePointerHash1    433        8191       
BasileStarynkevitch       466        8443       

--------------------------------
sizeof(T)       = 64
sizeof(T*)      = 4
insertion order = random

dataset size    =    50000 elements
name                      insert     access    
std::hash                 450        8135       
reinterpret_cast          457        8208       
djb2                      470        8736       
djb2_mod                  476        8698       
sdbm                      483        9420       
yet_another_lc            476        8953       
murmur2                   481        9089       
murmur3                   486        9283       
simple_xorshift           466        8663       
double_xorshift           468        8865       
right_shift_2             456        8301       
right_shift_3             456        8302       
right_shift_4             453        8337       
right_shift_5             457        8340       
right_shift_8             505        10379      
right_shift_12            1099       34923      
MyTemplatePointerHash1    464        8226       
BasileStarynkevitch       466        8372       

--------------------------------
sizeof(T)       = 64
sizeof(T*)      = 4
insertion order = sequential

dataset size    =  1000000 elements
name                      insert     access    
std::hash                 9548       35362      
reinterpret_cast          9635       35869      
djb2                      10668      37339      
djb2_mod                  10763      37311      
sdbm                      11126      37145      
yet_another_lc            11597      39944      
murmur2                   16296      39029      
murmur3                   16432      39280      
simple_xorshift           16066      38645      
double_xorshift           16108      38778      
right_shift_2             8966       35953      
right_shift_3             8916       35949      
right_shift_4             8973       35504      
right_shift_5             8941       34997      
right_shift_8             9356       31233      
right_shift_12            13831      45799      
MyTemplatePointerHash1    8839       31798      
BasileStarynkevitch       15349      38223      

--------------------------------
sizeof(T)       = 64
sizeof(T*)      = 4
insertion order = random

dataset size    =  1000000 elements
name                      insert     access    
std::hash                 14756      36237      
reinterpret_cast          14763      36918      
djb2                      15406      38771      
djb2_mod                  15551      38765      
sdbm                      14886      37078      
yet_another_lc            15700      40290      
murmur2                   16309      39024      
murmur3                   16432      39381      
simple_xorshift           16177      38625      
double_xorshift           16073      38750      
right_shift_2             14732      36961      
right_shift_3             14170      36965      
right_shift_4             13687      37295      
right_shift_5             11978      35135      
right_shift_8             11498      46930      
right_shift_12            25845      268052     
MyTemplatePointerHash1    10150      32046      
BasileStarynkevitch       15981      38143      

--------------------------------
sizeof(T)       = 256
sizeof(T*)      = 4
insertion order = sequential

dataset size    =    50000 elements
name                      insert     access    
std::hash                 432        7957       
reinterpret_cast          429        8036       
djb2                      462        8970       
djb2_mod                  453        8884       
sdbm                      460        9110       
yet_another_lc            466        9015       
murmur2                   495        9147       
murmur3                   494        9300       
simple_xorshift           479        8792       
double_xorshift           477        8948       
right_shift_2             430        8120       
right_shift_3             429        8132       
right_shift_4             432        8196       
right_shift_5             437        8324       
right_shift_8             425        8050       
right_shift_12            519        11291      
MyTemplatePointerHash1    425        8069       
BasileStarynkevitch       468        8496       

--------------------------------
sizeof(T)       = 256
sizeof(T*)      = 4
insertion order = random

dataset size    =    50000 elements
name                      insert     access    
std::hash                 462        7956       
reinterpret_cast          456        8046       
djb2                      490        9002       
djb2_mod                  483        8905       
sdbm                      482        9116       
yet_another_lc            492        8982       
murmur2                   492        9120       
murmur3                   492        9276       
simple_xorshift           477        8761       
double_xorshift           477        8903       
right_shift_2             458        8116       
right_shift_3             459        8124       
right_shift_4             462        8281       
right_shift_5             463        8370       
right_shift_8             458        8069       
right_shift_12            662        16244      
MyTemplatePointerHash1    459        8091       
BasileStarynkevitch       472        8476       

--------------------------------
sizeof(T)       = 256
sizeof(T*)      = 4
insertion order = sequential

dataset size    =  1000000 elements
name                      insert     access    
std::hash                 9756       34368      
reinterpret_cast          9718       34897      
djb2                      10935      36894      
djb2_mod                  10820      36788      
sdbm                      11084      37857      
yet_another_lc            11125      37996      
murmur2                   16522      39078      
murmur3                   16461      39314      
simple_xorshift           15982      38722      
double_xorshift           16151      38868      
right_shift_2             9611       34997      
right_shift_3             9571       35006      
right_shift_4             9135       34750      
right_shift_5             8978       32878      
right_shift_8             8688       30276      
right_shift_12            10591      35827      
MyTemplatePointerHash1    8721       30265      
BasileStarynkevitch       15524      38315      

--------------------------------
sizeof(T)       = 256
sizeof(T*)      = 4
insertion order = random

dataset size    =  1000000 elements
name                      insert     access    
std::hash                 14169      36078      
reinterpret_cast          14096      36637      
djb2                      15373      37492      
djb2_mod                  15279      37438      
sdbm                      15531      38247      
yet_another_lc            15924      38779      
murmur2                   16524      39109      
murmur3                   16422      39280      
simple_xorshift           16119      38735      
double_xorshift           16136      38875      
right_shift_2             14319      36692      
right_shift_3             14311      36776      
right_shift_4             13932      35682      
right_shift_5             12736      34530      
right_shift_8             9221       30663      
right_shift_12            15506      98465      
MyTemplatePointerHash1    9268       30697      
BasileStarynkevitch       15952      38349      

--------------------------------
sizeof(T)       = 1024
sizeof(T*)      = 4
insertion order = sequential

dataset size    =    50000 elements
name                      insert     access    
std::hash                 421        7863       
reinterpret_cast          419        7953       
djb2                      457        8983       
djb2_mod                  455        8927       
sdbm                      445        8609       
yet_another_lc            446        8892       
murmur2                   492        9090       
murmur3                   507        9294       
simple_xorshift           467        8687       
double_xorshift           472        8872       
right_shift_2             432        8009       
right_shift_3             432        8014       
right_shift_4             435        7998       
right_shift_5             442        8099       
right_shift_8             432        7914       
right_shift_12            462        8911       
MyTemplatePointerHash1    426        7744       
BasileStarynkevitch       467        8417       

--------------------------------
sizeof(T)       = 1024
sizeof(T*)      = 4
insertion order = random

dataset size    =    50000 elements
name                      insert     access    
std::hash                 452        7948       
reinterpret_cast          456        8018       
djb2                      489        9037       
djb2_mod                  490        8992       
sdbm                      477        8795       
yet_another_lc            491        9179       
murmur2                   502        9078       
murmur3                   507        9273       
simple_xorshift           473        8671       
double_xorshift           480        8873       
right_shift_2             470        8105       
right_shift_3             470        8100       
right_shift_4             476        8333       
right_shift_5             468        8065       
right_shift_8             469        8094       
right_shift_12            524        10216      
MyTemplatePointerHash1    451        7826       
BasileStarynkevitch       472        8419       

--------------------------------
sizeof(T)       = 1024
sizeof(T*)      = 4
insertion order = sequential

dataset size    =  1000000 elements
name                      insert     access    
std::hash                 10910      38432      
reinterpret_cast          10892      38994      
djb2                      10499      38985      
djb2_mod                  10507      38983      
sdbm                      11318      37450      
yet_another_lc            11740      38260      
murmur2                   16960      39544      
murmur3                   16816      39647      
simple_xorshift           16096      39021      
double_xorshift           16419      39183      
right_shift_2             10219      38909      
right_shift_3             10012      39036      
right_shift_4             10642      40284      
right_shift_5             10116      38678      
right_shift_8             9083       32914      
right_shift_12            9376       31586      
MyTemplatePointerHash1    8777       30129      
BasileStarynkevitch       16036      38913      

--------------------------------
sizeof(T)       = 1024
sizeof(T*)      = 4
insertion order = random

dataset size    =  1000000 elements
name                      insert     access    
std::hash                 16304      38695      
reinterpret_cast          16241      39641      
djb2                      16377      39533      
djb2_mod                  16428      39526      
sdbm                      15402      37811      
yet_another_lc            16180      38730      
murmur2                   16679      39355      
murmur3                   16792      39586      
simple_xorshift           16196      38965      
double_xorshift           16371      39127      
right_shift_2             16445      39263      
right_shift_3             16598      39421      
right_shift_4             16378      39839      
right_shift_5             15517      38442      
right_shift_8             11589      33547      
right_shift_12            11992      49041      
MyTemplatePointerHash1    9052       30222      
BasileStarynkevitch       16163      38829      

Solution 3

The result returned by the hash function has type size_t, but it gets converted into a 'bucket index' by the container, identifying the correct bucket to locate the object.

I think this conversion is not specified in the standard : but I'd expect this is usually a Modulo N operation, where N is the number of buckets - and that N is typically a power of two, as doubling the bucket count is a good way of increasing the size when there's too many hits. The Modulo N operation would mean that - for pointers - the naive hash function only uses a fraction of the buckets.

The real problem is that a 'good' hash algorithm for containers has to be based on a knowledge of the bucket size, and the values you're hashing. For example, if the objects you were storing in the table were all of size 1024 bytes, it's possible that the low-order 10 bits of each pointer could be the same.

struct MyOneKStruct x[100];  //bottom 10 bits of &x[n] are always the same

So, a 'best' hash for any application is likely to require a lot of trial and error and measurement, and knowledge of the distribution of the values that you're hashing.

However, rather than simply shifting the pointer down N bits, I would try something like XORing the top 'word' into the bottom one. Much like @BasileStarynkevich's answer.

The proposal about adding hash tables makes interesting reading. My emphasis in the para below: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html

It is impossible to write a fully general hash function that's valid for all types. (You can't just convert an object to raw memory and hash the bytes; among other reasons, that idea fails because of padding.) Because of that, and also because a good hash function is only good in the context of a specific usage pattern, it's essential to allow users to provide their own hash functions.

Solution 4

Obviously the answer is system and processor dependent (in particular, because of the page size and of the word size). I am proposing

  struct MyVoidPointerHash {
      size_t operator()(const void* val) const {
         uintptr_t ad = (uintptr_t) val;
         return (size_t) ((13*ad) ^ (ad >> 15));
      }
  };

The insight is that on many systems the page size is often 4Kbytes (i.e. 212) so the right shift >>15 will put significant address parts in the lower bits. The 13* is mostly for fun (but 13 is prime) and to shuffle more the bits. The exclusive or ^ is mixing bits and is really fast. So the lower bits of the hash is a mixture of many bits (both high and low) of the pointer.

I don't claim having put a lot of "science" in such hash functions. But they happen to often work quite well. YMMV. I would guess that you should avoid deactivating ASLR !

Solution 5

Can't beat your solution on performance racetrack (neither for char, nor for 1024-sized struct), but in sense of correctness there are some improvements:

#include <iostream>
#include <new>
#include <algorithm>
#include <unordered_set>
#include <chrono>

#include <cstdlib>
#include <cstdint>
#include <cstddef>
#include <cmath>

namespace
{

template< std::size_t argument, std::size_t base = 2, bool = (argument < base) >
constexpr std::size_t log = 1 + log< argument / base, base >;

template< std::size_t argument, std::size_t base >
constexpr std::size_t log< argument, base, true > = 0;

}

struct pointer_hash
{

    template< typename type >
    constexpr
    std::size_t
    operator () (type * p) const noexcept
    {
        return static_cast< std::size_t >(reinterpret_cast< std::uintptr_t >(p) >> log< std::max(sizeof(type), alignof(type)) >);
    }

};

template< typename type = std::max_align_t, std::size_t i = 0 >
struct alignas(alignof(type) << i) S
{

};

int
main()
{
    constexpr std::size_t _16M = (1 << 24);
    S<> * p = new S<>[_16M]{};
    auto latch = std::chrono::high_resolution_clock::now();
    {
        std::unordered_set< S<> *, pointer_hash > s;
        for (auto * pp = p; pp < p + _16M; ++pp) {
            s.insert(pp);
        }
    }
    std::cout << std::chrono::duration_cast< std::chrono::milliseconds >(std::chrono::high_resolution_clock::now() - latch).count() << "ms" << std::endl;
    delete [] p;
    return EXIT_SUCCESS;
}
Share:
20,257
egur
Author by

egur

Author of Intel QuickSync Decoder. Skills: Intel processor expert Expert overclocker Veteran C++ developer SW architect Video processing algorithm inventor Employed by Intel Corp. Currently enabling &amp; supporting OEMs that use Intel processors. Tech lead in Overclocking and own Intel Software Guard Extensions (Intel® SGX) OEM enabling.

Updated on April 21, 2020

Comments

  • egur
    egur about 4 years

    Hash table based containers are very fast associative array (e.g. unordered_map, unordered_set).

    Their performance is highly dependent on that hash function used to create an index for each entry. As hash tables grow, elements are rehashed again and again.

    Pointers are simple type, basically a 4/8 byte value that uniquely identify an object. The problem is that using an address as a result of the hash function is not efficient due to several LSB being zero.

    Example:

    struct MyVoidPointerHash {
        size_t operator()(const void* val) const {
            return (size_t)val;
        }
    };
    

    A faster implementation is to lose a few bits:

    struct MyVoidPointerHash2 {
        size_t operator()(const void* val) const {
            return ((size_t)val) >> 3; // 3 on 64 bit, 1 on 32 bit
        }
    };
    

    The latter produced 10-20% performance increase on a large application that uses hash sets and maps with tens of thousands of elements that are frequently built and cleared.

    Can someone offer a better scheme for hashing pointers?

    The function needs to be:

    1. Fast! and must inline well.
    2. Offer a reasonable distribution, rare collisions are allowed.

    Update - benchmark results

    I ran two sets of tests, one for int* and for a class pointer that has a size of 4KB. The results are very interesting.

    I used std::unordered_set for all test with data size being 16MB that was allocated in a single new call. The first algorithm ran twice to make sure sure caches are as hot as possible and the CPU is running at full speed.

    Setup: VS2013 (x64), i7-2600, Windows 8.1 x64.

    • VS2013 default hash function
    • Hash1: return (size_t)(val);
    • Hash2: return '(size_t)(val) >> 3;
    • Hash3(@BasileStarynkevitch): uintptr_t ad = (uintptr_t)val; return (size_t)((13 * ad) ^ (ad >> 15));
    • Hash4(@Roddy): uintptr_t ad = (uintptr_t)val; return (size_t)(ad ^ (ad >> 16));
    • Hash5(@egur):

    Code:

    template<typename Tval>
    struct MyTemplatePointerHash1 {
        size_t operator()(const Tval* val) const {
            static const size_t shift = (size_t)log2(1 + sizeof(Tval));
            return (size_t)(val) >> shift;
        }
    };
    

    Test 1 - int*:

    • VS2013 default took 1292ms
    • Hash1 took 742ms
    • Hash2 took 343ms
    • Hash3 took 1008ms
    • Hash4 took 629ms
    • Hash5 took 350ms

    Test 1 - 4K_class*:

    • VS2013 default took 0.423ms
    • Hash1 took 23.889ms
    • Hash2 took 6.331ms
    • Hash3 took 0.366ms
    • Hash4 took 0.390ms
    • Hash5 took 0.290ms

    Update2:

    Winner so far is the templated hash (Hash5) function. Best level of performance for speed for various block sizes.

    Update 3: Added default hash function for baseline. Turns out it's far from optimal.