Generate a Hash from string in Javascript

868,639

Solution 1

String.prototype.hashCode = function() {
  var hash = 0, i, chr;
  if (this.length === 0) return hash;
  for (i = 0; i < this.length; i++) {
    chr   = this.charCodeAt(i);
    hash  = ((hash << 5) - hash) + chr;
    hash |= 0; // Convert to 32bit integer
  }
  return hash;
};

Source: http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/

Solution 2

Many of the answers here are the same String.hashCode hash function taken from Java. It dates back to 1981 from Gosling Emacs, is extremely weak, and makes zero sense performance-wise in modern JavaScript. In fact, implementations could be significantly faster by using ES6 Math.imul, but no one took notice. We can do much better than this, at essentially identical performance.

Here's one I did—cyrb53, a simple but high quality 53-bit hash. It's quite fast, provides very good* hash distribution, and because it outputs 53 bits, has significantly lower collision rates compared to any 32-bit hash. Also, you can ignore SA's CC license as it's public domain on my GitHub.

const cyrb53 = function(str, seed = 0) {
    let h1 = 0xdeadbeef ^ seed, h2 = 0x41c6ce57 ^ seed;
    for (let i = 0, ch; i < str.length; i++) {
        ch = str.charCodeAt(i);
        h1 = Math.imul(h1 ^ ch, 2654435761);
        h2 = Math.imul(h2 ^ ch, 1597334677);
    }
    h1 = Math.imul(h1 ^ (h1>>>16), 2246822507) ^ Math.imul(h2 ^ (h2>>>13), 3266489909);
    h2 = Math.imul(h2 ^ (h2>>>16), 2246822507) ^ Math.imul(h1 ^ (h1>>>13), 3266489909);
    return 4294967296 * (2097151 & h2) + (h1>>>0);
};

*It is roughly similar to the well-known MurmurHash/xxHash algorithms. It uses a combination of multiplication and Xorshift to generate the hash, but not as thorough. As a result it's faster than either would be in JavaScript and significantly simpler to implement, but may not pass all tests in SMHasher. This is not a cryptographic hash function, so don't use this for security purposes.

Like any proper hash, it has an avalanche effect, which basically means small changes in the input have big changes in the output making the resulting hash appear more 'random':

"501c2ba782c97901" = cyrb53("a")
"459eda5bc254d2bf" = cyrb53("b")
"fbce64cc3b748385" = cyrb53("revenge")
"fb1d85148d13f93a" = cyrb53("revenue")

You can optionally supply a seed (unsigned integer, 32-bit max) for alternate streams of the same input:

"76fee5e6598ccd5c" = cyrb53("revenue", 1)
"1f672e2831253862" = cyrb53("revenue", 2)
"2b10de31708e6ab7" = cyrb53("revenue", 3)

Technically, it is a 64-bit hash, that is, two uncorrelated 32-bit hashes computed in parallel, but JavaScript is limited to 53-bit integers. If convenient, the full 64-bit output can be used by altering the return statement with a hex string or array.

return [h2>>>0, h1>>>0];
// or
return (h2>>>0).toString(16).padStart(8,0)+(h1>>>0).toString(16).padStart(8,0);
// or 
return 4294967296n * BigInt(h2) + BigInt(h1);

Be aware that constructing hex strings drastically slows down batch processing. The array is much more efficient, but obviously requires two checks instead of one. I also included BigInt, which should be slightly faster than String, but still much slower than Array or Number.


Just for fun, here's TinySimpleHash, the smallest hash I could come up with that's still decent. It's a 32-bit hash in 89 chars with better quality randomness than even FNV or DJB2:

TSH=s=>{for(var i=0,h=9;i<s.length;)h=Math.imul(h^s.charCodeAt(i++),9**9);return h^h>>>9}

Solution 3

EDIT

based on my jsperf tests, the accepted answer is actually faster: http://jsperf.com/hashcodelordvlad

ORIGINAL

if anyone is interested, here is an improved ( faster ) version, which will fail on older browsers who lack the reduce array function.

hashCode = function(s){
  return s.split("").reduce(function(a,b){a=((a<<5)-a)+b.charCodeAt(0);return a&a},0);              
}

one-liner arrow function version :

hashCode = s => s.split('').reduce((a,b)=>{a=((a<<5)-a)+b.charCodeAt(0);return a&a},0)

Solution 4

Note: Even with the best 32-bit hash, collisions will occur sooner or later.

The hash collision probablility can be calculated as 1 - e ^ (-k(k-1) / 2N, aproximated as k^2 / 2N (see here). This may be higher than intuition suggests:
Assuming a 32-bit hash and k=10,000 items, a collision will occur with a probablility of 1.2%. For 77,163 samples the probability becomes 50%! (calculator).
I suggest a workaround at the bottom.

In an answer to this question Which hashing algorithm is best for uniqueness and speed?, Ian Boyd posted a good in depth analysis. In short (as I interpret it), he comes to the conclusion that Murmur is best, followed by FNV-1a.
Java’s String.hashCode() algorithm that esmiralha proposed seems to be a variant of DJB2.

  • FNV-1a has a a better distribution than DJB2, but is slower
  • DJB2 is faster than FNV-1a, but tends to yield more collisions
  • MurmurHash3 is better and faster than DJB2 and FNV-1a (but the optimized implementation requires more lines of code than FNV and DJB2)

Some benchmarks with large input strings here: http://jsperf.com/32-bit-hash
When short input strings are hashed, murmur's performance drops, relative to DJ2B and FNV-1a: http://jsperf.com/32-bit-hash/3

So in general I would recommend murmur3.
See here for a JavaScript implementation: https://github.com/garycourt/murmurhash-js

If input strings are short and performance is more important than distribution quality, use DJB2 (as proposed by the accepted answer by esmiralha).

If quality and small code size are more important than speed, I use this implementation of FNV-1a (based on this code).

/**
 * Calculate a 32 bit FNV-1a hash
 * Found here: https://gist.github.com/vaiorabbit/5657561
 * Ref.: http://isthe.com/chongo/tech/comp/fnv/
 *
 * @param {string} str the input value
 * @param {boolean} [asString=false] set to true to return the hash value as 
 *     8-digit hex string instead of an integer
 * @param {integer} [seed] optionally pass the hash of the previous chunk
 * @returns {integer | string}
 */
function hashFnv32a(str, asString, seed) {
    /*jshint bitwise:false */
    var i, l,
        hval = (seed === undefined) ? 0x811c9dc5 : seed;

    for (i = 0, l = str.length; i < l; i++) {
        hval ^= str.charCodeAt(i);
        hval += (hval << 1) + (hval << 4) + (hval << 7) + (hval << 8) + (hval << 24);
    }
    if( asString ){
        // Convert to 8 digit hex string
        return ("0000000" + (hval >>> 0).toString(16)).substr(-8);
    }
    return hval >>> 0;
}

Improve Collision Probability

As explained here, we can extend the hash bit size using this trick:

function hash64(str) {
    var h1 = hash32(str);  // returns 32 bit (as 8 byte hex string)
    return h1 + hash32(h1 + str);  // 64 bit (as 16 byte hex string)
}

Use it with care and don't expect too much though.

Solution 5

Based on accepted answer in ES6. Smaller, maintainable and works in modern browsers.

function hashCode(str) {
  return str.split('').reduce((prevHash, currVal) =>
    (((prevHash << 5) - prevHash) + currVal.charCodeAt(0))|0, 0);
}

// Test
console.log("hashCode(\"Hello!\"): ", hashCode('Hello!'));

EDIT (2019-11-04):

one-liner arrow function version :

const hashCode = s => s.split('').reduce((a,b) => (((a << 5) - a) + b.charCodeAt(0))|0, 0)

// test
console.log(hashCode('Hello!'))
Share:
868,639
Freesnöw
Author by

Freesnöw

Updated on May 13, 2022

Comments

  • Freesnöw
    Freesnöw about 2 years

    I need to convert strings to some form of hash. Is this possible in JavaScript?

    I'm not utilizing a server-side language so I can't do it that way.

    • rlemon
      rlemon almost 13 years
      You want to look up something like Javascript md5.
    • henrikstroem
      henrikstroem almost 11 years
      MD5 is not secure, so don't look for that one.
    • Brad Koch
      Brad Koch about 10 years
      @henrikstroem Depends on what you are hashing; there's nothing wrong with using md5 to make a hash for non-security purposes.
    • dav
      dav over 9 years
      Here is a js md5 function github.com/blueimp/JavaScript-MD5 and here is the demo blueimp.github.io/JavaScript-MD5 works great for me.
    • Paul Ferrett
      Paul Ferrett over 9 years
      @BradKoch Depends on what you are doing; there's nothing wrong for using md5 for security purposes. There are certainly better methods for hashing passwords, but md5 is just fine for doing things like signing a URL.
    • domen
      domen over 8 years
      I find it funny that while MD5 is criticised in comments here, almost all answers recommend much worse hash algorithms and get lots of upvotes.
    • Orwellophile
      Orwellophile over 7 years
      An implementation of Jenkins's one-at-a-time hash window.hashJoaat=function(b){for(var a=0,c=b.length;c--;)a+=b.charCodeAt(c),a+=a<<10,a^=a>>6;a+=a‌​<<3;a^=a>>11;return(‌​(a+(a<<15)&429496729‌​5)>>>0).toString(16)‌​};
    • iPherian
      iPherian about 7 years
      @henrikstroem Most hash tables use far less secure algorithms. Because they need different properties from the algorithm.
    • henrikstroem
      henrikstroem about 7 years
      @iPherian Yes, that's true - good point. The reason for my comment is that the question didn't put the usage in context, and there is usually no reason for using MD5 over better algorithms.
    • James M. Lay
      James M. Lay over 6 years
      Using MD5 to verify that a download came intact is not magically going to email your passwords to all your co-workers.
    • Martin Capodici
      Martin Capodici over 5 years
      @henrikstroem I came here looking for a hash to pick a colour
    • dantechguy
      dantechguy over 3 years
      For a built-in real hash solution see Kaiido's answer
    • Enrique René
      Enrique René over 3 years
      I came here to create unique values for key property of ReactJS elements. Some objects don't provide an unique ID (or key) property when inside an array. ReactJS docs suggest avoid make use of index loop because of some unstability of this method. I think a hash, even non-secure one, is useful in this case.
    • Aron
      Aron almost 3 years
      @domen MD5 isn't bad because it is insecure. It is bad because it is both insecure, AND slow. There is no scenario where MD5 could not be replaced with something better suited for the purpose.
    • Freesnöw
      Freesnöw almost 3 years
      @Aron could you give some examples? You're replying to a comment that's half a decade old.
    • Andrew
      Andrew almost 3 years
      Note that if you do not need to hash a string specifically but just need a hash, you can use: window.crypto.getRandomValues(new Uint32Array(1))[0], or, for 8 hex character strings: window.crypto.getRandomValues(new Uint32Array(1))[0].toString(16)
    • mtraceur
      mtraceur almost 3 years
      @JamesM.Lay But it could "magically" email your passwords to all your coworkers, indirectly, if MD5 is broken badly enough that someone could craft a malicious replacement for your download that still hashes to the same MD5 hash, for example by appending some malicious data followed by padding to a PDF, which your PDF reader mishandles in a way that enables a sufficiently flexible remote code execution exploit or whatever.
    • James M. Lay
      James M. Lay over 2 years
      @mtraceur I can't believe I said that three years ago. I agree fully with your point, MD5 should not be used to verify downloads in an untrusted context. I think the point I'd try to drive home now is different - use MD5 for its speed, not for its security. Using MD5 to invalidate a cache or implement a hashmap of sorts, that's probably not going to result in malicious emails.
    • James M. Lay
      James M. Lay over 2 years
      @Aron You're probably right. But - MD5 is advantageous in that it is the fastest well-adopted algorithm. You can find it on basically any platform.
  • corsiKa
    corsiKa almost 13 years
    This is the same one used in Java. The hash << 5 - hash is the same as hash * 31 + char but a LOT faster. It's nice because it's so fast, and 31 is a small prime. Win win there.
  • Dan
    Dan almost 13 years
    Very lightweight to add on the page too. I like this. Just wondering what resistance to reversing the hashes this has? sequences of numbers for example result in predictable hashes sometimes.
  • Jelle De Loecker
    Jelle De Loecker about 12 years
    I did a few tests on jsperf (jsperf.com/hashing-strings) and the bitwise function is actually more slow than the number based function.
  • Noel Abrahams
    Noel Abrahams almost 12 years
    Thanks, but you have leaked 2 variables into the global namespace.
  • Stphane
    Stphane over 11 years
    This is nice ... I need to generate a hash value for images url to implement some lazy-loading functionality ... This would be perfect if it did generate unsigned values only :] Any idea to counter this issue ?
  • Peter Aron Zentai
    Peter Aron Zentai over 11 years
    number based is unusable for larger strings.
  • hdgarrood
    hdgarrood over 11 years
    hash = hash & hash -- the link seems to be saying that hashCode needs to return a 32 bit integer. surely in that case a better approach would be hash = hash & 0xffffffff? Math.pow(2, 33) bitwise and itself in Chrome returns 0 for me
  • None
    None about 11 years
    @hdgarrood -- Math.pow( 2, 33 ) & <anything> === 0 because the & op does an implicit cast to a 32-bit int and the last 32 bits of that are 0, so the 1 at bit 34 gets thrown away. Anything using the & op will be exactly the same, including & 0xffffffff
  • hdgarrood
    hdgarrood about 11 years
    @cwolves I understand! Thank you. Am I right in thinking that hash |= 0 should convert hash to a 32-bit int by keeping the least significant 32 bits?
  • TachyonVortex
    TachyonVortex almost 11 years
    @hdgarrood Correct. An example: hash = 7264728162427 (which is 69B738AA07B in hex). hash | 0 will chop off everything above 32 bits (in this case, 69B) to leave 738AA07B, which is 1938464891 in decimal. So you'll find that (hash | 0) === 1938464891.
  • TachyonVortex
    TachyonVortex almost 11 years
    @PeterAronZentai Why is it "unusable"? The output produced by the number-based code (hash * 31) + char is identical to the output produced by the shift-based code ((hash<<5)-hash)+char, even for very long strings (I've tested it with strings containing over a million characters), so it's not "unusable" in terms of accuracy. The complexity is O(n) for both the number-based and shift-based versions, so it's not "unusable" in terms of complexity.
  • Prosto Trader
    Prosto Trader over 10 years
    is there a way to get hash wich is positive number only?
  • Dave Cameron
    Dave Cameron over 10 years
    @lordvlad May I use this in an open source project? I think I copied it earlier, as it uses return a|=0 rather than return a&a.
  • lordvlad
    lordvlad over 10 years
    Yes, please! Always a pleasure to be of use :) just let me know what it is
  • Dave Cameron
    Dave Cameron over 10 years
    @lordvlad I've used it in some modifications I am making to the acra-storage project. It is a backend for android crash reporting.
  • BrunoLM
    BrunoLM over 10 years
    Faster comparing to...?
  • SGr
    SGr over 10 years
    @ProstoTrader Add >>>0 at the end. This function is 10 times faster than the accepted answer on both Firefox and Chrome for me.
  • lordvlad
    lordvlad over 10 years
    weird. i just tested it and it turned out to be waaay slower than the accepted answer. jsperf.com/hashcodelordvlad
  • mikemaccana
    mikemaccana over 10 years
    Good guy @lordvlad, actually testing his own answer, and then reporting when it was slower.
  • Don McCurdy
    Don McCurdy about 10 years
    Can anyone comment on the uniqueness (or not) of the output? Specifically, if I only use this hash for strings with length less than n, what is the largest n for which I can't possibly have a collision?
  • majidarif
    majidarif almost 10 years
    How would this be if I needed something that is unsigned?
  • rattray
    rattray almost 10 years
    Is there any reason this needs (or should) be on the String prototype? Would it be any less effective/efficient to just have eg; var hashCode = function hashCode (str) {etc...}? And then use as hashCode("mystring")?
  • ericsoco
    ericsoco almost 10 years
    @rattray I assume because it was adapted from werxltd.com/wp/2010/05/13/….
  • ericsoco
    ericsoco almost 10 years
    @skerit your Number-based perf test isn't correctly implementing the algorithm as described here: docs.oracle.com/javase/7/docs/api/java/lang/…. Here's an update: jsperf.com/hashing-strings/38 Number-based is significantly slower than bitwise.
  • svub
    svub over 9 years
    agree with @PeterAronZentai: implemented the method due to it's great performance but for long strings you'll simply get "Infinity"
  • Manuel Meurer
    Manuel Meurer over 9 years
    Why do you do ("0000000" + (hval >>> 0).toString(16)).substr(-8);? Isn't that the same as (hval >>> 0).toString(16)?
  • mar10
    mar10 over 9 years
    this add leading '0's so that the resulting hash is always 8 chars long. Easier to read and recognize in outputs, but that's my personal opinion
  • Manuel Meurer
    Manuel Meurer over 9 years
    Ah ok, I get it. For small hval, (hval >>> 0).toString(16) might be less than 8 characters, so you pad it with zeros. I was just confused because (hval >>> 0).toString(16) always resulted in a exactly 8 character string for me.
  • thdoan
    thdoan over 9 years
    Here's a performance comparison of three common string-to-integer hash functions: jsperf.com/string-hashing-methods
  • thdoan
    thdoan over 9 years
    I modified hashCode() to make it a little faster: jsperf.com/string-hashing-methods (str2hash)
  • lordvlad
    lordvlad over 9 years
    I just realized: It makes perfect sense that the accepted answer is faster, because my version has to turn the string into an array first, allocating new memory and copying every character...
  • Rafael Xavier
    Rafael Xavier about 9 years
    ES5 version of your answer is [].reduce.call( str, function( hash, i ) { var chr = i.charCodeAt(0); hash = ((hash << 5) - hash) + chr; return hash | 0; }, 0 );.
  • djabraham
    djabraham about 9 years
    Sweet! I was struggling to find out how to XOR (^) in JS because it was returning negative numbers. This thing you did with hval did the trick. Now I have an algorithm that returns the same hash in C# as in JS. I will post them both below. Thanks!
  • jcollum
    jcollum about 9 years
    @t0r0X well now I use a module called shortid: npmjs.com/package/shortid
  • Musakkhir Sayyed
    Musakkhir Sayyed about 9 years
    how to optimize this code to run faster in every browser. String.prototype.hashCode = function(){ var hash = 5381; if (this.length === 0) return hash; for (var i = 0; i < this.length; i++) { var character = this.charCodeAt(i); hash = ((hash<<5)+hash)^character; // Convert to 32bit integer } return hash; }
  • user40171
    user40171 almost 9 years
    I wonder if removing the line if (this.length == 0) return hash; will have a major impact on performance. In terms of clean code, in my opinion, it is just noise.
  • Adaptabi
    Adaptabi over 8 years
    since performance is a concern, why not O(n) but from the end to the start (making the for to reverse from the last char to the first char) - since there will be less comparisons i<len but `i--'
  • Scott Means
    Scott Means over 8 years
    FWIW I don't think the hash |= 0 expression is doing what's expected. I have an application where I need to generate matching hashes in both JavaScript and PHP, and so I ported this function to PHP. The hashes from the two functions didn't match, but when I explicitly kept the lower 32 bits (hash &= 0xffff) everything matched up.
  • Deekshith
    Deekshith over 8 years
    Check this for the same function implemented in ES6: stackoverflow.com/a/34842797/3439460
  • Dizzy
    Dizzy over 8 years
    [].reduce.call(str, (p, c, i, a) => (p << 5) - p + a.charCodeAt(i), 0);
  • OZZIE
    OZZIE about 8 years
    Stupid question maybe, but how do you use this function? :D
  • Israel
    Israel about 8 years
    @OZZIE var hashCode = yourString.hashCode();
  • BeetleJuice
    BeetleJuice almost 8 years
    Thanks for sharing I added str += "" before hashing to avoid exception str.split is not a function thrown when non-strings were passed as parameters
  • Benji XVI
    Benji XVI almost 8 years
    @ScottMeans You wouldn't expect |= 0 always to have the same result in JS and PHP – in JS it implicitly converts the first operand to a 32-bit integer – not the case in PHP. i.e. The 32-bit integer conversion is a quirk of the JS | operator.
  • cyberwombat
    cyberwombat almost 8 years
    How are you using the username with shortid? It just seems to generate ids but I don't see how you are using to generate a hash from a string
  • GavinoGrifoni
    GavinoGrifoni over 7 years
    I love this answer because it produces a sooo much better distributed hash: other functions proposed here will make consequent hash values. Eg `hash("example1") - hash("example2") == 1", while this one is much more unpredictable.
  • pensan
    pensan over 7 years
    create port, thank you. To ensure an always positive numbers (because I don't want negative ones) I changed return hash; to return (hash >>> 0);
  • Scott Means
    Scott Means over 7 years
    @BenjiXVI In retrospect, I don't know why I would have expected it to coerce the value to 32 bits in PHP. Not sure why the author didn't just &= in the first place.
  • koolaang
    koolaang over 7 years
    great answer, but what is the purpose of << 0 ?
  • mjs
    mjs over 7 years
    @koolaang it's the left shit operator, developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…
  • wdh
    wdh over 7 years
    @momomo Did you mean left shift?
  • RiZKiT
    RiZKiT over 7 years
    @pensan using hash >>> 0 may result in an error, that the value is too large for an Int32.
  • RiZKiT
    RiZKiT over 7 years
    @SGr using hash >>> 0 may result in an error, that the value is too large for an Int32.
  • David
    David over 7 years
    At which length it is recommended to use murmur3? I'm planning to decide which hash to use depending on the string length, so it would be fast for short strings and secure for large strings, but I'm not sure if there would be collisions between the two algorithms, is this a good idea?
  • jcollum
    jcollum over 7 years
    This answer has 3 downvotes. For the life of me I can't imagine why. No one has said anything... :-/
  • AndyO
    AndyO about 7 years
    But much, much slower than any of these: https://jsperf.com/hashing-strings
  • AndyO
    AndyO about 7 years
    I also just noticed that the fastest "retro" solution is actually smaller as well if you remove the line feeds so that it's only 3 lines long.
  • Deekshith
    Deekshith about 7 years
    @AndyO I dont see this code included in the jsperf link.
  • AndyO
    AndyO about 7 years
    Dude, really? a) It takes only a few minutes to setup your own benchmark with Benchmark.js (which is what I did) b) Do you think I can't see the test case you created here? Which is "94% slower"? jsperf.com/hashing-strings/56
  • Deekshith
    Deekshith about 7 years
    Pardon me, I got you wrong. I thought you posted the link which has this benchmark and I just pointed out that I was not able to see that. Thanks for reminding me of the test I created an year ago. Yes this was not the fastest back then but I was going with an assumption that ES6 was new and browser implementations of these new features is not optimized. But here we go even after an year, there is still no significant improvement in the performance. This debunks my assumption (?). I am now guessing that the functional primitives like reduce are inherently slower than a simple for loop.
  • jpfx1342
    jpfx1342 about 7 years
    @momomo I think he was asking why it was a left shift of zero bits.
  • mjs
    mjs about 7 years
    @jpfx1342 I realized that later. A quick test without suggests different output without so yes, needed. I believe it converts the stuff in parentheses to 32 bits integer. The 32-bit integer conversion is a quirk of JS.
  • djabraham
    djabraham almost 7 years
    @mathiasrw It's possible for Unicode characters to exceed 8 bits in memory, so I assume the 0xFF simply masks off anything outside that range. See more about charCodeAt() here: developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…
  • Neil Girardi
    Neil Girardi almost 7 years
    Can anyone tell me what the big O for this is? Please include how to pronounce it when reading it out loud (for example, "oh of en log en") Thank you so much!
  • Jorge Fuentes González
    Jorge Fuentes González almost 7 years
    asm.js uses ... | 0 instead of ... << 0. Maybe a quick test can show which performs better. EDIT: Tested. There's no noticeable difference.
  • Denis Giffeler
    Denis Giffeler almost 7 years
    I agree. The conversion to hex could be done a little different... var promise = crypto.subtle.digest({name: "SHA-256"}, Uint8Array.from(data)); promise.then(function(result){ console.log(Array.prototype.map.call(new Uint8Array(result), x => x.toString(16).padStart(2, '0')).join('')); });
  • Gyum Fox
    Gyum Fox over 6 years
    The entropy between two consecutive values is very low! "a" is 97 and b is "98"
  • Dids
    Dids over 6 years
    Any way to have this produce only positive but still unique results?
  • Codebeat
    Codebeat over 6 years
    Cannot get this to work in C and Pascal, any ideas how to achieve the same results?
  • duhaime
    duhaime over 6 years
    2147483647 = 2**32/2-1
  • Sukima
    Sukima over 6 years
    @BeetleJuice The more appropriate question is if you have a function that is designed to take a string then why is your program sending it a non-string in the first place? Perhaps the error is a sign that the caller is doing strange things. Food for thought.
  • Sukima
    Sukima over 6 years
    @deekshith The accepted answer uses hash |= 0 to convert to an 32 bit int. This implementation does not. Is this a bug?
  • Deekshith
    Deekshith over 6 years
    @Sukima Thanks for pointing out. No idea how I missed it. Fixed it just now :)
  • abhimanyuaryan
    abhimanyuaryan almost 6 years
    how did you come up with this formula? hash = ((hash << 5) - hash) + chr;. I saw this post which is using something different github.com/darkskyapp/string-hash/blob/master/index.js @esmiralha
  • abhimanyuaryan
    abhimanyuaryan almost 6 years
    @lordvlad where do you get this formula from? (a<<5)-a)+b.charCodeAt(0)
  • lordvlad
    lordvlad almost 6 years
    @AbhimanyuAryan same as the accepted answer, from werxltd.com/wp/2010/05/13/…
  • lapo
    lapo almost 6 years
    Wow, this is so much better than the usual *31 one for short (or similar) inputs. :)
  • Maykonn
    Maykonn almost 6 years
    What means 2147483647?
  • bryc
    bryc almost 6 years
    A cryptographic hash function for strings is a bit overkill.. crypto isn't exactly performant.
  • robinnnnn
    robinnnnn over 5 years
    What is the significance of bitshifting to the left 5 times, as opposed a different number?
  • bryc
    bryc over 5 years
    @robinnnnn Bitshifting to the left 5 times is equivalent to multiplying by 32. It was chosen because it probably resulted in the least amount of collisions during testing. You could get away even with shifting to the left just once, which is multiplying by two. There's nothing special about it. It's not much better than 4 or 6. This isn't even particularly a good hash function. FNV is significantly better, and MurmurHash is even better than FNV.
  • bryc
    bryc over 5 years
    Using a secure cryptographic hash can be extremely slow. Avoiding collisions is a product of the bit width, not security. 128 bit non-cryptographic hash or even 64 bits should be more than enough for most purposes. MurmurHash3_x86_128 is quite fast and has a very low chance of collisions.
  • Nijraj Gelani
    Nijraj Gelani over 5 years
    @Maykonn (2^32 - 1)
  • bryc
    bryc over 5 years
    In response to "FNV-1a has a a better distribution than DJB2, but is slower" - I think it should be said that FNV1a can be extremely fast when implemented using the ES6 Math.imul function. That alone makes it top benchmarks, and ultimately a better choice than DJB2 in the long run.
  • bryc
    bryc over 5 years
    If ES6 is available (all modern engines support it), Math.imul can be used for the multiplication step, which greatly improves performance. Only issue is, it won't work in IE11 without a shim.
  • turtlepower
    turtlepower over 5 years
    Is this a one-way hash? Or it could be reversed?
  • lordvlad
    lordvlad over 5 years
  • bryc
    bryc over 5 years
    The source code of that lib isn't even readable.. just 50k of minified code.
  • Oleg Abrazhaev
    Oleg Abrazhaev over 5 years
    @bryc that's how vendor code supposed to look like :) and for sources you can check github.com/puleos/object-hash/blob/master/index.js
  • BachT
    BachT over 5 years
    Failed for I.E 11: Object doesn't support property or method 'imul'.
  • bryc
    bryc over 5 years
    @BachT You can use a polyfill or a full ES6 shim. But IE11 is tragically frozen in 2009 with no updates.
  • bryc
    bryc over 5 years
    Minified code is 35.4 KB while the full source is 14.2 KB? That makes no sense.
  • Oleg Abrazhaev
    Oleg Abrazhaev over 5 years
    @bryc have you considered this line? var crypto = require('crypto');. I think it adds this dependency code from the vendor in the minified version during a build.
  • Bob Smith
    Bob Smith about 5 years
    The setup and re-setup of the implicit variables and anonymous function is what kills the performance. Played with the JS Perf a minute to see if reduce could be viable, and it does not come close. So, I crunched verbose to a three line version which is on par with verbose performance. jsperf.com FWIW @AbhimanyuAryan This is a 32 bit hash based on a math series s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] (source java.net)
  • Crypto Batman
    Crypto Batman about 5 years
    I'm getting a negative number as a result from this when using JSON.stringify on an object. How can I adjust this to make sure that I always get a positive number for the hash?
  • Kossi D. T. S.
    Kossi D. T. S. over 4 years
    @bryc How do you pick those big numbers, which aren't necessarily prime? How do they effect the efficiency and the performance of the result?
  • bryc
    bryc over 4 years
    @KossiD.T.S. I didn't pick those multipliers; they were borrowed elsewhere (L'Ecuyer, Knuth, MurmurHash). Usually these numbers are found by clever people via probabilistic testing (e.g. simulated annealing, genetic programming), searching for the best statistical result, tuned for their use case. They don't really affect efficiency, just quality of the hash result. For example, making them even numbers typically breaks everything, but millions of combinations of odd numbers can give decent results. I'm sure even better constants could be found. I never tested this with SMHasher.
  • bryc
    bryc over 4 years
    @KossiD.T.S. Which explains why I used 9**9 as a multiplier in the TSH hash at the bottom of the answer. It's just a large odd number, 387420489. More scrambling tends to happen, the larger the multiplier is (but 32-bit is the limit of bitwise operations in JS).
  • Admin
    Admin over 4 years
    Remove if (this.length === 0) return hash; because 0 < 0 == false so for will skip and it return 0 anyway
  • bryc
    bryc over 4 years
    @jcollum that's why i almost never answer stale questions.. hit and runs go unnoticed. even after you fix up the answer, no one comes along to balance it out.
  • Polv
    Polv over 4 years
    If you really need to hash Objects, I wrote any-serialize to serialize ANY Object with sort keys, then cyrb53 to generate base36 hash.
  • stefanmaric
    stefanmaric over 4 years
    Came here just to say that it is indeed way faster using Math.imul: https://jsperf.com/fowler-noll-vo-hash-function-implementati‌​ons
  • Luc
    Luc about 4 years
    Reliable quality random without having to rely on people running tests, built-in (no custom implementation needed), seedable, and I only needed a few hundred numbers for generating a game map, this seemed perfect. But it turns out there is absolutely no way to do it synchronously. Having to provide some async callback thingy every time you call your seeded random engine makes the code super unreadable and look ridiculous. I don't get who came up with this crappy crypto.subtle interface, so I in the end I had to go with xmur3+sfc32 from this answer: stackoverflow.com/a/47593316/1201863
  • Luc
    Luc about 4 years
    How is this different from the answer by Kaiido two years before yours?
  • Константин Ван
    Константин Ван about 4 years
    @Luc It is not, apparently.
  • Shiva Avula
    Shiva Avula about 4 years
    Perfecto. Thanks for sharing !
  • huug
    huug about 4 years
    I upvoted this answer for the one-liner even if it's a little slower than the accepted answer... thanks!
  • Basj
    Basj about 4 years
    Warning! this is a bad hashing method, it does not really "hash": hashCode("hello") (99162322) has the same first digits than hashCode("hellp") (99162323); this is typically what we don't want from a hash function...
  • tibbus
    tibbus about 4 years
    That's awesome, exactly what I was looking for, to generate a small hash based on a larger string. I needed to scroll so much for this solution, but was worth it!!!
  • raisahab
    raisahab almost 4 years
    how to use it on a string is my question
  • mjs
    mjs almost 4 years
    Replace this with the string.
  • mjs
    mjs almost 4 years
    @raisahab I added the method for those that do not want or know how to prototype to String. Just call the method hashCode("your string")
  • raisahab
    raisahab almost 4 years
    Thanks, i figured it out. I am a C++ developer, so learning on the fly. Thanks once again. @mmm
  • TechWisdom
    TechWisdom over 3 years
    The link in the question is broken
  • aomarks
    aomarks over 3 years
    @bryc Great answer! Could you please offer an open-source license for your cyrb53 function (e.g. MIT)?
  • bryc
    bryc over 3 years
    @aomarks sorry, i don't vibe with middle age royal decrees; it's public domain, take that as you will. maybe if MIT offered me free tuition to endorse them ;)
  • aomarks
    aomarks over 3 years
    @bryc Public domain is great! Would you be willing to explicitly grant a public-domain equivalent license for cyrb53? Suggested examples: opensource.org/licenses/0BSD opensource.org/licenses/unlicense The (frustrating) reason for this request is that in some jurisdictions, declaring that something is in the public domain does not legally make it so. So, the closest available tool is to cover it by a license that grants the same usage freedoms as the public domain. See en.wikipedia.org/wiki/Public-domain-equivalent_license. Thanks for considering!
  • Tomáš Zato
    Tomáš Zato over 3 years
    I have no idea why "modern" jsdevs think reduce, map and friends are faster. Do you not realize that these method each create a copy of the arrray?
  • aomarks
    aomarks over 3 years
    Hey @byrc! Could you comment on the operations that happen after the loop, and before the 53-bit int construction (h1 = Math.imul(h1 ^ (h1>>>16)... and the next line). I'm curious what the purpose of those operations is, and what would be the downside of omitting those lines and directly constructing the 53-bit int from h1 and h2?
  • bryc
    bryc over 3 years
    @aomarks Those two lines represent the 'final mixing function', and is required to achieve the Avalanche effect. If you remove it, it becomes a statistically weaker hash. The hash output will be less uniform (less random) in some cases, which has the consequence of slightly more collisions. Red highlighted parts in image show defects occurring when final mix is removed.
  • aomarks
    aomarks over 3 years
    Thanks @byrc! Also, what is the tool you used to do that analysis and generate that image? Looks useful.
  • bryc
    bryc over 3 years
    @aomarks it's my own tool, a simple test for collisions and distribution. it's super messy but here is an older public version. also see SMHasher, but is harder to use and needs cmake etc. Haven't yet tamed that one ;)
  • Jon Neal
    Jon Neal over 3 years
    This is awesome, @byrc! Mind some code golfing? If we count down from the string length we’ll save a variable, and if we pre-increment (operators beforehand) we’ll save a copy operation. Tiny theoretical speed improvements 🤓 and 2 characters shorter 😎. TSH=s=>{for(var i=s.length,h=9;i;)h=Math.imul(h^s.charCodeAt(--i),9**9);retu‌​rn h^h>>>9}
  • bryc
    bryc over 3 years
    @JonNeal nice! i didn't think of that, although it has to process in reverse. code golfing is fun; I'm trying to implement minimalistic LZ compression in JS here; github.com/bryc/code/tree/master/jspack#lzjb-2005. Currently only have done LZJB but it's pretty short for what it is, both compress/decompress in under 512 chars.
  • FooBar
    FooBar over 3 years
    How precise is this? I'm using it to difference audio files from eachother
  • Odys
    Odys over 3 years
    @bryc can't the 9**9, in the last version, be precalculated? seems static
  • bryc
    bryc over 3 years
    @Odys it can, but the point for that one is to be as short as possible, 9**9 is 4 chars and is the largest odd number achievable with ** notation. 387420489 is 9 chars. I believe modern JS engines optimize that away. At that point, you might as well use a statistically better constant tuned for your use case. Cheers.
  • Chris
    Chris over 3 years
    One way to obtain non-negative but unique results would be to add 2^31. + Math.pow(2, 31)
  • tleb
    tleb over 3 years
    Do you actually believe that this version is more maintainable?
  • bruh
    bruh over 3 years
    @stefanmaric jsperf link is broken, mind sharing a different link?
  • taseenb
    taseenb about 3 years
    Can you explain why you used 31?
  • amirhe
    amirhe about 3 years
    cause it must be (hash << 5) - hash) which means (2^5 x hash - hash) = 32 x hash - hash = 31 x hash
  • stefanmaric
    stefanmaric about 3 years
    @bruh so JSPerf is in general out of service. I don't have the specific test suite I used at JSPerf but created a new one here that should be similar: jsbench.me/vbknu74f3e
  • bryc
    bryc about 3 years
    Here's the benchmark I didn't show for my previous comment: jsbench.me/8mjqpwyesk :)
  • muzuiget
    muzuiget about 3 years
    It looks like we can move | 0 to the return statement, because Math.imul() will do the convert in each loop.
  • cYee
    cYee about 3 years
    @Basj why is the warning? 99162322 and 99162323 are still different, isn't it?
  • cYee
    cYee about 3 years
    Got it, I see what do you mean there. Thanks!
  • cape_bsas
    cape_bsas about 3 years
    Just tested: Accepted esmiralha answer x10.000 runs: ~380ms. lordVlad's one-liner x10.000 runs: ~160ms. Dizzy's one-liner x10.000 runs: ~1300ms.
  • Whinger
    Whinger almost 3 years
    FWIW, you don't need to |0 if you've used n<<5, because the bitwise op converts to int: you only need to do it at the end. Some simple testing suggests this makes it much faster.
  • Andrew
    Andrew almost 3 years
    str must be a string, seed must be a number, and the output returned from the function is a number. But what are the specs in regards to the str and seed parameters in terms of how short/long or low/high they can be? And can seed be a floating point? Etc.
  • bryc
    bryc almost 3 years
    @Andrew Updated answer to clarify on seed. It's intended to be a 32-bit unsigned integer. So the values can be 0 to 2^32-1, and with no decimal points. So no floats; JS will just strip fractional part upon the first XOR operation anyway. There's no limit on the length of str. Also, it can easily be used with Array instead of String, but this question is for strings.
  • ashleedawg
    ashleedawg over 2 years
    Here's the same code, but even [slightly] more compressed (and as a regular function): function hash(s){return s.split("").reduce((a,b)=>(a=(a<<5)-a+b.charCodeAt(0))&a,0)}
  • MiraTech
    MiraTech over 2 years
    Which hashing algorithm is being applied for this hash?
  • bryc
    bryc over 2 years
    @MiraTech Java's built-in String.hashCode. Originally created in 1981 for Gosmacs.
  • GGizmos
    GGizmos over 2 years
    Not sure why this happens, but if I use a seed of 12, the function sometimes returns a 13 digit hex (not including '0x') at the front , and sometimes a 14. I am converting "number" to Hex string using ethers.js. Maybe it has something to do with length of the string to be hashed?
  • bryc
    bryc over 2 years
    @GGizmos Not sure what you mean. What's the exact string/seed values that causes cyrb53 to return 0x13/0x14? This might instead be an ethers.js issue. See if Math.random()*2**53 also causes 0x13/0x14.
  • GGizmos
    GGizmos over 2 years
    @bryc: using a seed of 12, and then outputting the string resulting from ethers.utils.hexValue(result), I get the value 0x1e5ec69e41378b (16 chars) from an input value of "a", and the value 0x978a7061b8325 when the input value is "ab"
  • GGizmos
    GGizmos over 2 years
    @bryc: actually, a seed of 0 produces a similar result -- with different length of hash of "a" versus "ab". Anyway to modify to always produce a result of the same length?
  • bryc
    bryc over 2 years
    @GGizmos Try the alt toString(16) return line in the answer which produces a fixed 64-bit hex string; perhaps you can bypass the ethers utility function which I still think is the problem here. I would still need a repro case (EXACT input arguments to reproduce the return of 0x13/0x14) to verify if its an issue with cyrb53. In general, it is extremely unlikely for cyrb53 to return such a small number.
  • Bojan Krkic
    Bojan Krkic over 2 years
    Note that this might return negative numbers. Slap a Math.abs on if you want only positive numbers. Note that this will increase chances of collision, but as long as you aren't using it for security...
  • James Stewart
    James Stewart over 2 years
    What is the purpose of this part: return a & a? Shouldn't that just return a regardless?
  • Dan Froberg
    Dan Froberg about 2 years
    Dont use + or ^ because it has to be - to make a complement, otherwise information is shifted out. When I changed a char the hash number did not change, so I had to think. With bitwise rotation no information is discarded. I ended up with; h = ((h << 5) | (h >>> 27)) ^ s.charCodeAt(i); The bitwise xor is faster than minus that now can be replaced. The rotation cost an extra right shift of the most significant bits to patch over the zeros that is left shifted in: java2s.com/example/nodejs/number/…
  • Suchipi
    Suchipi about 2 years
    StackOverflow indicates this code is licensed CC BY-SA 4.0, which limits its use to stuff licensed under a similar license (sharealike). Would you be willing to relicense it under a more permissive license, like the MIT license?
  • I0_ol
    I0_ol about 2 years
    I'm confused about the line result += ('00000000' + view.getUint32(i).toString(16)).slice(-8); What is the point of adding 8 zeroes 00000000 and then slicing them off at the end? .slice(-8)? When I remove those parts I get the exact same result (in my very limited amount of testing). What am I missing?
  • Kaiido
    Kaiido about 2 years
    @I0_ol this is the equivalent of x.toString(16).padStart(2, '0') from Denis's comment. .slice(-8) will remove the ones before that aren't used.
  • I0_ol
    I0_ol about 2 years
    @Kaiido What I'm saying is that I get the same result without adding 0's or slicing them off. Basically this result += ('00000000' + view.getUint32(i).toString(16)).slice(-8); gives the same result as result += view.getUint32(i).toString(16);
  • Kaiido
    Kaiido about 2 years
    @I0_ol because you always got a value higher than 0xFF0000 Simply try with view = new DataView(new Uint32Array([0]).buffer). You'll get "0" without the padding and "00000000" with.
  • Peter Gerdes
    Peter Gerdes about 2 years
    @bryc The numbers for these kinds of functions aren't choosen by guessing and checking via things like simulated annealing. Rather, they are based on a mathematical analysis of the underlying group/ring structure. They look really weird, but it's usually conditions like: you want prime or relatively prime values, or values that are near/far powers of small primes or whose differences...you get the idea. Basically, you do some group theory and the way these numbers relate defines your subgroup structure.
  • bryc
    bryc about 2 years
    @PeterGerdes Would you have some sources/reference material on how these 'constants' are chosen? MurmurHash author found his constants through a simulated annealing algorithm, which is why I mentioned it; unfortunately few authors explain the source of their constants. I personally haven't found prime multipliers to outperform quality-wise any random large odd number. But I'd be interested to know if the algorithm can be improved with different properly chosen constants.
  • Peter Gerdes
    Peter Gerdes about 2 years
    Ohh I meant the primes as an example (u usually have to look at the underlying group thy) ...and sorry I meant to say "often are choosen" but I took too long editing my comment. Let me find you an example. Here's the note that 31 in java's hashcode is picked based on being a prime sitepoint.com/how-to-implement-javas-hashcode-correctly and here's a remark about the constant in another hash function being determined by a mathematicial formula stackoverflow.com/questions/24922452/….
  • Peter Gerdes
    Peter Gerdes about 2 years
    Ohh here's a better explanation of where the number 2654435761 comes from in the Knuth multiplicative hash comes from (and the computation of it comes from picking a prime but the number isn't literally equal to the prime) stackoverflow.com/questions/11871245/knuth-multiplicative-ha‌​sh
  • bryc
    bryc about 2 years
    @PeterGerdes Thanks. But all that really tells me is that primes are supposed to be magic. What's missing is the data to back up such choices. You mention Graph Theory but I'm not about to study 4yr Uni math just to see its application to this niche. But I somehow doubt a 32-bit fractional approximation of Phi is magically better than millions of other large odd multipliers. Anyway, I keep a table of multipliers I've found around, as comparing/analyzing them could be interesting.
  • Vlad Churakov
    Vlad Churakov about 2 years
    Bitwise operators depend on the platform arch. So, on 64bit ones, the result is 64bits, on 32 is 32. In the given answer, hash |= 0; doesn't aways round it down to 32 bit int, in the majority of the cases (as of now, you get 64 bit int). If you need 32 bit int only, please change it to hash &= 0xFFFF;