Fastest hash for non-cryptographic uses?

123,735

Solution 1

CRC32 is pretty fast and there's a function for it: http://www.php.net/manual/en/function.crc32.php

But you should be aware that CRC32 will have more collisions than MD5 or even SHA-1 hashes, simply because of the reduced length (32 bits compared to 128 bits respectively 160 bits). But if you just want to check whether a stored string is corrupted, you'll be fine with CRC32.

Solution 2

fcn     time  generated hash
crc32:  0.03163  798740135
md5:    0.0731   0dbab6d0c841278d33be207f14eeab8b
sha1:   0.07331  417a9e5c9ac7c52e32727cfd25da99eca9339a80
xor:    0.65218  119
xor2:   0.29301  134217728
add:    0.57841  1105

And the code used to generate this is:

 $loops = 100000;
 $str = "ana are mere";

 echo "<pre>";

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $x = crc32($str);
 }
 $tse = microtime(true);
 echo "\ncrc32: \t" . round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $x = md5($str);
 }
 $tse = microtime(true);
 echo "\nmd5: \t".round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $x = sha1($str);
 }
 $tse = microtime(true);
 echo "\nsha1: \t".round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $l = strlen($str);
  $x = 0x77;
  for($j=0;$j<$l;$j++){
   $x = $x xor ord($str[$j]);
  }
 }
 $tse = microtime(true);
 echo "\nxor: \t".round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $l = strlen($str);
  $x = 0x08;
  for($j=0;$j<$l;$j++){
   $x = ($x<<2) xor $str[$j];
  }
 }
 $tse = microtime(true);
 echo "\nxor2: \t".round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $l = strlen($str);
  $x = 0;
  for($j=0;$j<$l;$j++){
   $x = $x + ord($str[$j]);
  }
 }
 $tse = microtime(true);
 echo "\nadd: \t".round($tse-$tss, 5) . " \t" . $x;

Solution 3

There's a speed comparison on the xxHash repository. This is what it shows, on January 12, 2021.

Hash Name Width Bandwidth (GB/s) Small Data Velocity Quality Comment
XXH3 (SSE2) 64 31.5 GB/s 133.1 10
XXH128 (SSE2) 128 29.6 GB/s 118.1 10
RAM sequential read N/A 28.0 GB/s N/A N/A for reference
City64 64 22.0 GB/s 76.6 10
T1ha2 64 22.0 GB/s 99.0 9 Slightly worse [collisions]
City128 128 21.7 GB/s 57.7 10
XXH64 64 19.4 GB/s 71.0 10
SpookyHash 64 19.3 GB/s 53.2 10
Mum 64 18.0 GB/s 67.0 9 Slightly worse [collisions]
XXH32 32 9.7 GB/s 71.9 10
City32 32 9.1 GB/s 66.0 10
Murmur3 32 3.9 GB/s 56.1 10
SipHash 64 3.0 GB/s 43.2 10
FNV64 64 1.2 GB/s 62.7 5 Poor avalanche properties
Blake2 256 1.1 GB/s 5.1 10 Cryptographic
SHA1 160 0.8 GB/s 5.6 10 Cryptographic but broken
MD5 128 0.6 GB/s 7.8 10 Cryptographic but broken

It seems xxHash is by far the fastest one, while many others beat older hashes, like CRC32, MD5 and SHA.

Solution 4

Ranked list where each loop shares the same thing to crypt as all the others.

<?php

set_time_limit(720);

$begin = startTime();
$scores = array();


foreach(hash_algos() as $algo) {
    $scores[$algo] = 0;
}

for($i=0;$i<10000;$i++) {
    $number = rand()*100000000000000;
    $string = randomString(500);

    foreach(hash_algos() as $algo) {
        $start = startTime();

        hash($algo, $number); //Number
        hash($algo, $string); //String

        $end = endTime($start);

        $scores[$algo] += $end;
    }   
}


asort($scores);

$i=1;
foreach($scores as $alg => $time) {
    print $i.' - '.$alg.' '.$time.'<br />';
    $i++;
}

echo "Entire page took ".endTime($begin).' seconds<br />';

echo "<br /><br /><h2>Hashes Compared</h2>";

foreach($scores as $alg => $time) {
    print $i.' - '.$alg.' '.hash($alg,$string).'<br />';
    $i++;
}

function startTime() {
   $mtime = microtime(); 
   $mtime = explode(" ",$mtime); 
   $mtime = $mtime[1] + $mtime[0]; 
   return $mtime;   
}

function endTime($starttime) {
   $mtime = microtime(); 
   $mtime = explode(" ",$mtime); 
   $mtime = $mtime[1] + $mtime[0]; 
   $endtime = $mtime; 
   return $totaltime = ($endtime - $starttime); 
}

function randomString($length) {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyz';
    $string = '';    
    for ($p = 0; $p < $length; $p++) {
        $string .= $characters[mt_rand(0, strlen($characters) - 1)];
    }
    return $string;
}

?>

And the output

1 - crc32b 0.111036300659
2 - crc32 0.112048864365
3 - md4 0.120795726776
4 - md5 0.138875722885
5 - sha1 0.146368741989
6 - adler32 0.15501332283
7 - tiger192,3 0.177447080612
8 - tiger160,3 0.179498195648
9 - tiger128,3 0.184012889862
10 - ripemd128 0.184052705765
11 - ripemd256 0.185411214828
12 - salsa20 0.198500156403
13 - salsa10 0.204956293106
14 - haval160,3 0.206098556519
15 - haval256,3 0.206891775131
16 - haval224,3 0.206954240799
17 - ripemd160 0.207638263702
18 - tiger192,4 0.208125829697
19 - tiger160,4 0.208438634872
20 - tiger128,4 0.209359407425
21 - haval128,3 0.210256814957
22 - sha256 0.212738037109
23 - ripemd320 0.215386390686
24 - haval192,3 0.215610980988
25 - sha224 0.218329429626
26 - haval192,4 0.256464719772
27 - haval160,4 0.256565093994
28 - haval128,4 0.257113456726
29 - haval224,4 0.258928537369
30 - haval256,4 0.259262084961
31 - haval192,5 0.288433790207
32 - haval160,5 0.290239810944
33 - haval256,5 0.291721343994
34 - haval224,5 0.294484138489
35 - haval128,5 0.300224781036
36 - sha384 0.352449893951
37 - sha512 0.354603528976
38 - gost 0.392376661301
39 - whirlpool 0.629067659378
40 - snefru256 0.829529047012
41 - snefru 0.833986997604
42 - md2 1.80192279816
Entire page took 22.755341053 seconds


Hashes Compared

1 - crc32b 761331d7
2 - crc32 7e8c6d34
3 - md4 1bc8785de173e77ef28a24bd525beb68
4 - md5 9f9cfa3b5b339773b8d6dd77bbe931dd
5 - sha1 ca2bd798e47eab85655f0ce03fa46b2e6e20a31f
6 - adler32 f5f2aefc
7 - tiger192,3 d11b7615af06779259b29446948389c31d896dee25edfc50
8 - tiger160,3 d11b7615af06779259b29446948389c31d896dee
9 - tiger128,3 d11b7615af06779259b29446948389c3
10 - ripemd128 5f221a4574a072bc71518d150ae907c8
11 - ripemd256 bc89cd79f4e70b73fbb4faaf47a3caf263baa07e72dd435a0f62afe840f5c71c
12 - salsa20 91d9b963e172988a8fc2c5ff1a8d67073b2c5a09573cb03e901615dc1ea5162640f607e0d7134c981eedb761934cd8200fe90642a4608eacb82143e6e7b822c4
13 - salsa10 320b8cb8498d590ca2ec552008f1e55486116257a1e933d10d35c85a967f4a89c52158f755f775cd0b147ec64cde8934bae1e13bea81b8a4a55ac2c08efff4ce
14 - haval160,3 27ad6dd290161b883e614015b574b109233c7c0e
15 - haval256,3 03706dd2be7b1888bf9f3b151145b009859a720e3fe921a575e11be801c54c9a
16 - haval224,3 16706dd2c77b1888c29f3b151745b009879a720e4fe921a576e11be8
17 - ripemd160 f419c7c997a10aaf2d83a5fa03c58350d9f9d2e4
18 - tiger192,4 112f486d3a9000f822c050a204d284d52473f267b1247dbd
19 - tiger160,4 112f486d3a9000f822c050a204d284d52473f267
20 - tiger128,4 112f486d3a9000f822c050a204d284d5
21 - haval128,3 9d9155d430218e4dcdde1c62962ecca3
22 - sha256 6027f87b4dd4c732758aa52049257f9e9db7244f78c132d36d47f9033b5c3b09
23 - ripemd320 9ac00db553b51662826267daced37abfccca6433844f67d8f8cfd243cf78bbbf86839daf0961b61d
24 - haval192,3 7d706dd2d37c1888eaa53b154948b009e09c720effed21a5
25 - sha224 b6395266d8c7e40edde77969359e6a5d725f322e2ea4bd73d3d25768
26 - haval192,4 d87cd76e4c8006d401d7068dce5dec3d02dfa037d196ea14
27 - haval160,4 f2ddd76e156d0cd40eec0b8d09c8f23d0f47a437
28 - haval128,4 f066e6312b91e7ef69f26b2adbeba875
29 - haval224,4 1b7cd76ea97c06d439d6068d7d56ec3d73dba0373895ea14e465bc0e
30 - haval256,4 157cd76e8b7c06d432d6068d7556ec3d66dba0371c95ea14e165bc0ec31b9d37
31 - haval192,5 05f9ea219ae1b98ba33bac6b37ccfe2f248511046c80c2f0
32 - haval160,5 e054ec218637bc8b4bf1b26b2fb40230e0161904
33 - haval256,5 48f6ea210ee1b98be835ac6b7dc4fe2f39841104a37cc2f06ceb2bf58ab4fe78
34 - haval224,5 57f6ea2111e1b98bf735ac6b92c4fe2f43841104ab7cc2f076eb2bf5
35 - haval128,5 ccb8e0ac1fd12640ecd8976ab6402aa8
36 - sha384 bcf0eeaa1479bf6bef7ece0f5d7111c3aeee177aa7990926c633891464534cd8a6c69d905c36e882b3350ef40816ed02
37 - sha512 8def9a1e6e31423ef73c94251d7553f6fe3ed262c44e852bdb43e3e2a2b76254b4da5ef25aefb32aae260bb386cd133045adfa2024b067c2990b60d6f014e039
38 - gost ef6cb990b754b1d6a428f6bb5c113ee22cc9533558d203161441933d86e3b6f8
39 - whirlpool 54eb1d0667b6fdf97c01e005ac1febfacf8704da55c70f10f812b34cd9d45528b60d20f08765ced0ab3086d2bde312259aebf15d105318ae76995c4cf9a1e981
40 - snefru256 20849cbeda5ddec5043c09d36b2de4ba0ea9296b6c9efaa7c7257f30f351aea4
41 - snefru 20849cbeda5ddec5043c09d36b2de4ba0ea9296b6c9efaa7c7257f30f351aea4
42 - md2 d4864c8c95786480d1cf821f690753dc

Solution 5

2019 update: This answer is the most up to date. Libraries to support murmur are largely available for all languages.

The current recommendation is to use the Murmur Hash Family (see specifically the murmur2 or murmur3 variants).

Murmur hashes were designed for fast hashing with minimal collisions (much faster than CRC, MDx and SHAx). It's perfect to look for duplicates and very appropriate for HashTable indexes.

In fact it's used by many of the modern databases (Redis, ElastisSearch, Cassandra) to compute all sort of hashes for various purposes. This specific algorithm was the root source of many performance improvements in the current decade.

It's also used in implementations of Bloom Filters. You should be aware that if you're searching for "fast hashes", you're probably facing a typical problem that is solved by Bloom filters. ;-)

Note: murmur is a general purpose hash, meaning NON cryptographic. It doesn't prevent to find the source "text" that generated a hash. It's NOT appropriate to hash passwords.

Some more details: MurmurHash - what is it?

Share:
123,735

Related videos on Youtube

John
Author by

John

Glad I can post here!

Updated on August 26, 2021

Comments

  • John
    John almost 3 years

    I'm essentially preparing phrases to be put into the database, they may be malformed so I want to store a short hash of them instead (I will be simply comparing if they exist or not, so hash is ideal).

    I assume MD5 is fairly slow on 100,000+ requests so I wanted to know what would be the best method to hash the phrases, maybe rolling out my own hash function or using hash('md4', '...' would be faster in the end?

    I know MySQL has MD5(), so that would complement a bit of speed on the query end, but maybe there's further a faster hashing function in MySQL I don't know about that would work with PHP..

    • NullUserException
      NullUserException almost 14 years
      What's stopping you from benchmarking the hashes?
    • John
      John almost 14 years
      NullUserException: You're right, I'll try them with random length phrases. Just wanted insight on what would be the norm if any to handle this sort of thing.
    • Amber
      Amber almost 14 years
      MD5 isn't really that slow...
    • Admin
      Admin almost 14 years
      +1 to Amber: I've always been under the impression that MD5, although cryptologically unsafe, is among the fastest of the hash algorithms.
    • bluszcz
      bluszcz over 11 years
      Not related directly, but 'hash' functions seems to be not optimized: I made small benchmark dev.bluszcz.net/blog/dont-trust-the-comments-md5-vs-hash-in-‌​php
    • rogerdpack
      rogerdpack over 10 years
    • michael
      michael over 6 years
      This is a very good question to ask, and the comments implying it isn't, or is unimportant, and/or should be obvious and/or intuitive -- are disappointing & frustrating. (And also not at all unexpected.)
    • Shlomi Hassid
      Shlomi Hassid about 6 years
      @Amber you're so right. MD5 is the fastest for a "real life" php use... This performance test randomize string length and content for each iteration. This way we get a better idea about the actual performance. This will also avoid caching. php hashing checksum performance
    • SurpriseDog
      SurpriseDog about 3 years
      @NullUserException We're all here from Google hoping you guys have already done it for us. ;-)
    • Glenn Slayden
      Glenn Slayden over 2 years
      Technically, the literally correct answer to this question is not mentioned in any of the answers. For the record, the fastest hash for any use is simply x → 0. Yes, returning the same value (i.e., zero) for every input will result in "a lot" of hash collisions, but is nevertheless a perfectly valid—most certainly the fastest—hash function, which thus, by definition, must never affect overall correctness. The problem, of course, is that it defeats the intent of hashing by pushing all the actual work onto the collision handling mechanism which usually isn't designed for such loads.
    • Kröw
      Kröw over 2 years
      @GlennSlayden That's not a hash function.
  • John
    John almost 14 years
    Wow, only required datatype is an unsigned integer, this will be SIGNIFICANLY faster than other hashing.
  • John
    John almost 14 years
    Ah, Thank you for this insight actually, just fortifies my use of CRC32 being fastest.
  • Thomas Pornin
    Thomas Pornin almost 14 years
    @John: or not. CRC32 turns out to be slower than MD4, and not much faster than MD5, on ARM processors. Besides, CRC32 uses an unsigned 32-bit integer type, which is exactly all that MD5 needs...
  • Camilo Martin
    Camilo Martin over 13 years
    There's a point to consider. MD5 takes 128 bits instead of 32. This means that database storage takes 4 times more space and hence 4 times slower to look up for comparing hashes (I think). What I'm concerned with (for my uses) is how fast it will be to query the database later when it's full of hashes.
  • Thomas Pornin
    Thomas Pornin over 13 years
    If you do not use a wide enough output then you will get random collisions, which will be bad since the goal is to query a database to know whether a given "phrase" is already known; collisions here turn into false positives. With 32 bits, you will begin to see collisions as soon as you have 60000 or so phrases. This is true for all hash functions, cryptographic or not. That being said, you can always take the output of a hash function and truncate it to any length you see fit, within the limitations explained above.
  • Peter Ajtai
    Peter Ajtai almost 13 years
    @John - You can retrieve the hashing algorithms using: hash_algos(). The following hash benchmarking code was in the PHP comments ==> codepad.viper-7.com/5Wdhw6
  • MM.
    MM. about 11 years
    There's a minimal off-by-one error at the end. strlen($characters) should be strlen($characters) - 1 :)
  • rogerdpack
    rogerdpack over 10 years
    if you have the benefit/luxury of a newer Intel cpu, there is a crc32c assembly command that is...probably really fast (though isn't the traditional crc32 value). See also xxhash code.google.com/p/xxhash
  • Mohd Abdul Mujib
    Mohd Abdul Mujib over 9 years
    @ThomasPornin If we go by the truncating way, wouldn't it again face the collision problem, I mean the only reason the md5 is supposed to not get easy collision is the extra no of characters its having when compared to CRC32, right?
  • nxasdf
    nxasdf over 9 years
    If MD5 is faster than a generic CRC32 function then something is very wrong.
  • Ahmed
    Ahmed about 9 years
    MD5 is now considered insecure. It's way more insecure than SHA1. Read MD5 wiki page.
  • My1
    My1 over 8 years
    question is "how quick is this thing"?
  • Maxim Masiutin
    Maxim Masiutin about 7 years
    Thank you for your code. I have improved it a little bit. I don't think that we should compare functions like md5() that process the whole string and loops that do byte by byte like you made with xor. In PHP, these loops are very slow and are even slower than the md5 itself. We should compare one hases with another, all implemented as functions.
  • Quamis
    Quamis about 7 years
    @MaximMasiutin the xor examples were there to compare "native" hashing with some sort of weak custom hashing function, just to demonstate that PHP's loop per byte is slow, and point out that it should be avoided, just as you noticed
  • keune
    keune about 7 years
    There is an open request here to add murmurhash to php, which you can vote on.
  • Sam Tolton
    Sam Tolton over 6 years
    Just a quick note - I tried this with a much longer string (~5000 chars) and CRC32 was slower than MD5 and SHA1 on my machine (i7-6650U, 16GB). CRC32 - 1.7s , MD5 - 1.4s, SHA1 - 1.5s. Always test for yourself.
  • Quamis
    Quamis over 6 years
    please note the response by @AalexGabi below. Hashing speed will vary with string length, and also note that newer processors have support for hardware-accelerated hashes (not sure if PHP actually uses them, and also not sure what processors have this). As you said, always test for yourself, in your own environment.
  • Shlomi Hassid
    Shlomi Hassid about 6 years
    @Quamis the test is nice but may be misleading - as @samTolton noted the results are different and md5 is faster. A better test will be to randomize the strings content and length too. this way we get a better idea about the actual real world performance. This will also avoid caching. Take a look: php hashing checksum performance
  • Quamis
    Quamis about 6 years
    Of course the speed will vary according to the used hardware, input size, php version, RAM speed, cache size, OS type & version, system load, etc. This wasn't meant to be a comprehensive test, I thought I've already made it clear that one should test for himself
  • hanshenrik
    hanshenrik over 5 years
    sodium_crypto_generichash(), which is BLAKE2b, a hash function more secure than MD5 but faster than SHA256. (Link has benchmarks, etc.) - blake2b sure is, but a USERLAND PHP implementation of blake2b is going to be a lot slower than the C-implemented sha256 for PHP ... i wish PHP could adobt blake2b in the hash_algos() suite..
  • Scott Arciszewski
    Scott Arciszewski over 5 years
    The pure PHP implementation wasn't suggested here.
  • bryc
    bryc almost 5 years
    Software CRC32 should be significantly slower than simple XOR or addition. It's also significantly slower than typical string hashes like FNV. PHP must use a built in method where it uses SSE-optimized C++ and the crc32 instruction on modern CPUs. The benchmark here just doesn't make sense.
  • David J.
    David J. about 4 years
    Re: "You want a way to identify unique strings while cleaning up 'malformed' strings": would you elaborate please?
  • Anachronist
    Anachronist about 4 years
    It's hard to remember what I was thinking over six years ago... I might have been alluding to the fact that you don't get collisions with urlencode or base64_encode, so the results would be as unique as the original strings.
  • Aurast
    Aurast over 3 years
    If you are concerned about how much space the hash requires in the database, it's perfectly valid to only use the first X bits of a hash. Not necessarily recommending it, but you could use MD5 and only use the first four bytes.
  • Bruno Cassol
    Bruno Cassol over 3 years
    I improved this code to fetch all algorithms available in PHP, benchmark them, sort by speed and print sample hash of each. See this repository please: github.com/brunocassol/php_hashes