Generate a Hash from string in Javascript

javascript hash

868,639

Solution 1

String.prototype.hashCode = function() {
  var hash = 0, i, chr;
  if (this.length === 0) return hash;
  for (i = 0; i < this.length; i++) {
    chr   = this.charCodeAt(i);
    hash  = ((hash << 5) - hash) + chr;
    hash |= 0; // Convert to 32bit integer
  }
  return hash;
};

Source: http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/

Solution 2

Many of the answers here are the same String.hashCode hash function taken from Java. It dates back to 1981 from Gosling Emacs, is extremely weak, and makes zero sense performance-wise in modern JavaScript. In fact, implementations could be significantly faster by using ES6 Math.imul, but no one took notice. We can do much better than this, at essentially identical performance.

Here's one I did—cyrb53, a simple but high quality 53-bit hash. It's quite fast, provides very good* hash distribution, and because it outputs 53 bits, has significantly lower collision rates compared to any 32-bit hash. Also, you can ignore SA's CC license as it's public domain on my GitHub.

const cyrb53 = function(str, seed = 0) {
    let h1 = 0xdeadbeef ^ seed, h2 = 0x41c6ce57 ^ seed;
    for (let i = 0, ch; i < str.length; i++) {
        ch = str.charCodeAt(i);
        h1 = Math.imul(h1 ^ ch, 2654435761);
        h2 = Math.imul(h2 ^ ch, 1597334677);
    }
    h1 = Math.imul(h1 ^ (h1>>>16), 2246822507) ^ Math.imul(h2 ^ (h2>>>13), 3266489909);
    h2 = Math.imul(h2 ^ (h2>>>16), 2246822507) ^ Math.imul(h1 ^ (h1>>>13), 3266489909);
    return 4294967296 * (2097151 & h2) + (h1>>>0);
};

*It is roughly similar to the well-known MurmurHash/xxHash algorithms. It uses a combination of multiplication and Xorshift to generate the hash, but not as thorough. As a result it's faster than either would be in JavaScript and significantly simpler to implement, but may not pass all tests in SMHasher. This is not a cryptographic hash function, so don't use this for security purposes.

Like any proper hash, it has an avalanche effect, which basically means small changes in the input have big changes in the output making the resulting hash appear more 'random':

"501c2ba782c97901" = cyrb53("a")
"459eda5bc254d2bf" = cyrb53("b")
"fbce64cc3b748385" = cyrb53("revenge")
"fb1d85148d13f93a" = cyrb53("revenue")

You can optionally supply a seed (unsigned integer, 32-bit max) for alternate streams of the same input:

"76fee5e6598ccd5c" = cyrb53("revenue", 1)
"1f672e2831253862" = cyrb53("revenue", 2)
"2b10de31708e6ab7" = cyrb53("revenue", 3)

Technically, it is a 64-bit hash, that is, two uncorrelated 32-bit hashes computed in parallel, but JavaScript is limited to 53-bit integers. If convenient, the full 64-bit output can be used by altering the return statement with a hex string or array.

return [h2>>>0, h1>>>0];
// or
return (h2>>>0).toString(16).padStart(8,0)+(h1>>>0).toString(16).padStart(8,0);
// or 
return 4294967296n * BigInt(h2) + BigInt(h1);

Be aware that constructing hex strings drastically slows down batch processing. The array is much more efficient, but obviously requires two checks instead of one. I also included BigInt, which should be slightly faster than String, but still much slower than Array or Number.

Just for fun, here's TinySimpleHash, the smallest hash I could come up with that's still decent. It's a 32-bit hash in 89 chars with better quality randomness than even FNV or DJB2:

TSH=s=>{for(var i=0,h=9;i<s.length;)h=Math.imul(h^s.charCodeAt(i++),9**9);return h^h>>>9}

Solution 3

EDIT

based on my jsperf tests, the accepted answer is actually faster: http://jsperf.com/hashcodelordvlad

ORIGINAL

if anyone is interested, here is an improved ( faster ) version, which will fail on older browsers who lack the reduce array function.

hashCode = function(s){
  return s.split("").reduce(function(a,b){a=((a<<5)-a)+b.charCodeAt(0);return a&a},0);              
}

one-liner arrow function version :

hashCode = s => s.split('').reduce((a,b)=>{a=((a<<5)-a)+b.charCodeAt(0);return a&a},0)

Solution 4

Note: Even with the best 32-bit hash, collisions will occur sooner or later.

The hash collision probablility can be calculated as $1 - e^{\frac{-k(k-1)}{2N}}$ , aproximated as $\frac{k^2}{2N}$ (see here). This may be higher than intuition suggests:
Assuming a 32-bit hash and k=10,000 items, a collision will occur with a probablility of 1.2%. For 77,163 samples the probability becomes 50%! (calculator).
I suggest a workaround at the bottom.

In an answer to this question Which hashing algorithm is best for uniqueness and speed?, Ian Boyd posted a good in depth analysis. In short (as I interpret it), he comes to the conclusion that Murmur is best, followed by FNV-1a.
Java’s String.hashCode() algorithm that esmiralha proposed seems to be a variant of DJB2.

FNV-1a has a a better distribution than DJB2, but is slower
DJB2 is faster than FNV-1a, but tends to yield more collisions
MurmurHash3 is better and faster than DJB2 and FNV-1a (but the optimized implementation requires more lines of code than FNV and DJB2)

Some benchmarks with large input strings here: http://jsperf.com/32-bit-hash
When short input strings are hashed, murmur's performance drops, relative to DJ2B and FNV-1a: http://jsperf.com/32-bit-hash/3

So in general I would recommend murmur3.
See here for a JavaScript implementation: https://github.com/garycourt/murmurhash-js

If input strings are short and performance is more important than distribution quality, use DJB2 (as proposed by the accepted answer by esmiralha).

If quality and small code size are more important than speed, I use this implementation of FNV-1a (based on this code).

/**
 * Calculate a 32 bit FNV-1a hash
 * Found here: https://gist.github.com/vaiorabbit/5657561
 * Ref.: http://isthe.com/chongo/tech/comp/fnv/
 *
 * @param {string} str the input value
 * @param {boolean} [asString=false] set to true to return the hash value as 
 *     8-digit hex string instead of an integer
 * @param {integer} [seed] optionally pass the hash of the previous chunk
 * @returns {integer | string}
 */
function hashFnv32a(str, asString, seed) {
    /*jshint bitwise:false */
    var i, l,
        hval = (seed === undefined) ? 0x811c9dc5 : seed;

    for (i = 0, l = str.length; i < l; i++) {
        hval ^= str.charCodeAt(i);
        hval += (hval << 1) + (hval << 4) + (hval << 7) + (hval << 8) + (hval << 24);
    }
    if( asString ){
        // Convert to 8 digit hex string
        return ("0000000" + (hval >>> 0).toString(16)).substr(-8);
    }
    return hval >>> 0;
}

Improve Collision Probability

As explained here, we can extend the hash bit size using this trick:

function hash64(str) {
    var h1 = hash32(str);  // returns 32 bit (as 8 byte hex string)
    return h1 + hash32(h1 + str);  // 64 bit (as 16 byte hex string)
}

Use it with care and don't expect too much though.

Solution 5

Based on accepted answer in ES6. Smaller, maintainable and works in modern browsers.

function hashCode(str) {
  return str.split('').reduce((prevHash, currVal) =>
    (((prevHash << 5) - prevHash) + currVal.charCodeAt(0))|0, 0);
}

// Test
console.log("hashCode(\"Hello!\"): ", hashCode('Hello!'));

EDIT (2019-11-04):

one-liner arrow function version :

const hashCode = s => s.split('').reduce((a,b) => (((a << 5) - a) + b.charCodeAt(0))|0, 0)

// test
console.log(hashCode('Hello!'))

View more solutions

868,639

Author by

Freesnöw

Updated on May 13, 2022

Comments

Freesnöw about 2 years

I need to convert strings to some form of hash. Is this possible in JavaScript?

I'm not utilizing a server-side language so I can't do it that way.
- rlemon almost 13 years
  
  You want to look up something like Javascript md5.
- henrikstroem almost 11 years
  
  MD5 is not secure, so don't look for that one.
- Brad Koch about 10 years
  
  @henrikstroem Depends on what you are hashing; there's nothing wrong with using md5 to make a hash for non-security purposes.
- dav over 9 years
  
  Here is a js md5 function github.com/blueimp/JavaScript-MD5 and here is the demo blueimp.github.io/JavaScript-MD5 works great for me.
- Paul Ferrett over 9 years
  
  @BradKoch Depends on what you are doing; there's nothing wrong for using md5 for security purposes. There are certainly better methods for hashing passwords, but md5 is just fine for doing things like signing a URL.
- domen over 8 years
  
  I find it funny that while MD5 is criticised in comments here, almost all answers recommend much worse hash algorithms and get lots of upvotes.
- Orwellophile over 7 years
  
  An implementation of Jenkins's one-at-a-time hash window.hashJoaat=function(b){for(var a=0,c=b.length;c--;)a+=b.charCodeAt(c),a+=a<<10,a^=a>>6;a+=a‌<<3;a^=a>>11;return(‌(a+(a<<15)&429496729‌5)>>>0).toString(16)‌};
- iPherian about 7 years
  
  @henrikstroem Most hash tables use far less secure algorithms. Because they need different properties from the algorithm.
- henrikstroem about 7 years
  
  @iPherian Yes, that's true - good point. The reason for my comment is that the question didn't put the usage in context, and there is usually no reason for using MD5 over better algorithms.
- James M. Lay over 6 years
  
  Using MD5 to verify that a download came intact is not magically going to email your passwords to all your co-workers.
- Martin Capodici over 5 years
  
  @henrikstroem I came here looking for a hash to pick a colour
- dantechguy over 3 years
  
  For a built-in real hash solution see Kaiido's answer
- Enrique René over 3 years
  
  I came here to create unique values for key property of ReactJS elements. Some objects don't provide an unique ID (or key) property when inside an array. ReactJS docs suggest avoid make use of index loop because of some unstability of this method. I think a hash, even non-secure one, is useful in this case.
- Aron almost 3 years
  
  @domen MD5 isn't bad because it is insecure. It is bad because it is both insecure, AND slow. There is no scenario where MD5 could not be replaced with something better suited for the purpose.
- Freesnöw almost 3 years
  
  @Aron could you give some examples? You're replying to a comment that's half a decade old.
- Andrew almost 3 years
  
  Note that if you do not need to hash a string specifically but just need a hash, you can use: window.crypto.getRandomValues(new Uint32Array(1))[0], or, for 8 hex character strings: window.crypto.getRandomValues(new Uint32Array(1))[0].toString(16)
- mtraceur almost 3 years
  
  @JamesM.Lay But it could "magically" email your passwords to all your coworkers, indirectly, if MD5 is broken badly enough that someone could craft a malicious replacement for your download that still hashes to the same MD5 hash, for example by appending some malicious data followed by padding to a PDF, which your PDF reader mishandles in a way that enables a sufficiently flexible remote code execution exploit or whatever.
- James M. Lay over 2 years
  
  @mtraceur I can't believe I said that three years ago. I agree fully with your point, MD5 should not be used to verify downloads in an untrusted context. I think the point I'd try to drive home now is different - use MD5 for its speed, not for its security. Using MD5 to invalidate a cache or implement a hashmap of sorts, that's probably not going to result in malicious emails.
- James M. Lay over 2 years
  
  @Aron You're probably right. But - MD5 is advantageous in that it is the fastest well-adopted algorithm. You can find it on basically any platform.
corsiKa almost 13 years

This is the same one used in Java. The hash << 5 - hash is the same as hash * 31 + char but a LOT faster. It's nice because it's so fast, and 31 is a small prime. Win win there.
Dan almost 13 years

Very lightweight to add on the page too. I like this. Just wondering what resistance to reversing the hashes this has? sequences of numbers for example result in predictable hashes sometimes.
Jelle De Loecker about 12 years

I did a few tests on jsperf (jsperf.com/hashing-strings) and the bitwise function is actually more slow than the number based function.
Noel Abrahams almost 12 years

Thanks, but you have leaked 2 variables into the global namespace.
Stphane over 11 years

This is nice ... I need to generate a hash value for images url to implement some lazy-loading functionality ... This would be perfect if it did generate unsigned values only :] Any idea to counter this issue ?
Peter Aron Zentai over 11 years

number based is unusable for larger strings.
hdgarrood over 11 years

hash = hash & hash -- the link seems to be saying that hashCode needs to return a 32 bit integer. surely in that case a better approach would be hash = hash & 0xffffffff? Math.pow(2, 33) bitwise and itself in Chrome returns 0 for me
None about 11 years

@hdgarrood -- Math.pow( 2, 33 ) & <anything> === 0 because the & op does an implicit cast to a 32-bit int and the last 32 bits of that are 0, so the 1 at bit 34 gets thrown away. Anything using the & op will be exactly the same, including & 0xffffffff
hdgarrood about 11 years

@cwolves I understand! Thank you. Am I right in thinking that hash |= 0 should convert hash to a 32-bit int by keeping the least significant 32 bits?
TachyonVortex almost 11 years

@hdgarrood Correct. An example: hash = 7264728162427 (which is 69B738AA07B in hex). hash | 0 will chop off everything above 32 bits (in this case, 69B) to leave 738AA07B, which is 1938464891 in decimal. So you'll find that (hash | 0) === 1938464891.
TachyonVortex almost 11 years

@PeterAronZentai Why is it "unusable"? The output produced by the number-based code (hash * 31) + char is identical to the output produced by the shift-based code ((hash<<5)-hash)+char, even for very long strings (I've tested it with strings containing over a million characters), so it's not "unusable" in terms of accuracy. The complexity is O(n) for both the number-based and shift-based versions, so it's not "unusable" in terms of complexity.
Prosto Trader over 10 years

is there a way to get hash wich is positive number only?
Dave Cameron over 10 years

@lordvlad May I use this in an open source project? I think I copied it earlier, as it uses return a|=0 rather than return a&a.
lordvlad over 10 years

Yes, please! Always a pleasure to be of use :) just let me know what it is
Dave Cameron over 10 years

@lordvlad I've used it in some modifications I am making to the acra-storage project. It is a backend for android crash reporting.
BrunoLM over 10 years

Faster comparing to...?
SGr over 10 years

@ProstoTrader Add >>>0 at the end. This function is 10 times faster than the accepted answer on both Firefox and Chrome for me.
lordvlad over 10 years

weird. i just tested it and it turned out to be waaay slower than the accepted answer. jsperf.com/hashcodelordvlad
mikemaccana over 10 years

Good guy @lordvlad, actually testing his own answer, and then reporting when it was slower.
Don McCurdy about 10 years

Can anyone comment on the uniqueness (or not) of the output? Specifically, if I only use this hash for strings with length less than n, what is the largest n for which I can't possibly have a collision?
majidarif almost 10 years

How would this be if I needed something that is unsigned?
rattray almost 10 years

Is there any reason this needs (or should) be on the String prototype? Would it be any less effective/efficient to just have eg; var hashCode = function hashCode (str) {etc...}? And then use as hashCode("mystring")?
ericsoco almost 10 years

@rattray I assume because it was adapted from werxltd.com/wp/2010/05/13/….
ericsoco almost 10 years

@skerit your Number-based perf test isn't correctly implementing the algorithm as described here: docs.oracle.com/javase/7/docs/api/java/lang/…. Here's an update: jsperf.com/hashing-strings/38 Number-based is significantly slower than bitwise.
svub over 9 years

agree with @PeterAronZentai: implemented the method due to it's great performance but for long strings you'll simply get "Infinity"
Manuel Meurer over 9 years

Why do you do ("0000000" + (hval >>> 0).toString(16)).substr(-8);? Isn't that the same as (hval >>> 0).toString(16)?
mar10 over 9 years

this add leading '0's so that the resulting hash is always 8 chars long. Easier to read and recognize in outputs, but that's my personal opinion
Manuel Meurer over 9 years

Ah ok, I get it. For small hval, (hval >>> 0).toString(16) might be less than 8 characters, so you pad it with zeros. I was just confused because (hval >>> 0).toString(16) always resulted in a exactly 8 character string for me.
thdoan over 9 years

Here's a performance comparison of three common string-to-integer hash functions: jsperf.com/string-hashing-methods
thdoan over 9 years

I modified hashCode() to make it a little faster: jsperf.com/string-hashing-methods (str2hash)
lordvlad over 9 years

I just realized: It makes perfect sense that the accepted answer is faster, because my version has to turn the string into an array first, allocating new memory and copying every character...
Rafael Xavier about 9 years

ES5 version of your answer is [].reduce.call( str, function( hash, i ) { var chr = i.charCodeAt(0); hash = ((hash << 5) - hash) + chr; return hash | 0; }, 0 );.
djabraham about 9 years

Sweet! I was struggling to find out how to XOR (^) in JS because it was returning negative numbers. This thing you did with hval did the trick. Now I have an algorithm that returns the same hash in C# as in JS. I will post them both below. Thanks!
jcollum about 9 years

@t0r0X well now I use a module called shortid: npmjs.com/package/shortid
Musakkhir Sayyed about 9 years

how to optimize this code to run faster in every browser. String.prototype.hashCode = function(){ var hash = 5381; if (this.length === 0) return hash; for (var i = 0; i < this.length; i++) { var character = this.charCodeAt(i); hash = ((hash<<5)+hash)^character; // Convert to 32bit integer } return hash; }
user40171 almost 9 years

I wonder if removing the line if (this.length == 0) return hash; will have a major impact on performance. In terms of clean code, in my opinion, it is just noise.
Adaptabi over 8 years

since performance is a concern, why not O(n) but from the end to the start (making the for to reverse from the last char to the first char) - since there will be less comparisons i<len but `i--'
Scott Means over 8 years

FWIW I don't think the hash |= 0 expression is doing what's expected. I have an application where I need to generate matching hashes in both JavaScript and PHP, and so I ported this function to PHP. The hashes from the two functions didn't match, but when I explicitly kept the lower 32 bits (hash &= 0xffff) everything matched up.
Deekshith over 8 years

Check this for the same function implemented in ES6: stackoverflow.com/a/34842797/3439460
Dizzy over 8 years

[].reduce.call(str, (p, c, i, a) => (p << 5) - p + a.charCodeAt(i), 0);
OZZIE about 8 years

Stupid question maybe, but how do you use this function? :D
Israel about 8 years

@OZZIE var hashCode = yourString.hashCode();
BeetleJuice almost 8 years

Thanks for sharing I added str += "" before hashing to avoid exception str.split is not a function thrown when non-strings were passed as parameters
Benji XVI almost 8 years

@ScottMeans You wouldn't expect |= 0 always to have the same result in JS and PHP – in JS it implicitly converts the first operand to a 32-bit integer – not the case in PHP. i.e. The 32-bit integer conversion is a quirk of the JS | operator.
cyberwombat almost 8 years

How are you using the username with shortid? It just seems to generate ids but I don't see how you are using to generate a hash from a string
GavinoGrifoni over 7 years

I love this answer because it produces a sooo much better distributed hash: other functions proposed here will make consequent hash values. Eg `hash("example1") - hash("example2") == 1", while this one is much more unpredictable.
pensan over 7 years

create port, thank you. To ensure an always positive numbers (because I don't want negative ones) I changed return hash; to return (hash >>> 0);
Scott Means over 7 years

@BenjiXVI In retrospect, I don't know why I would have expected it to coerce the value to 32 bits in PHP. Not sure why the author didn't just &= in the first place.
koolaang over 7 years

great answer, but what is the purpose of << 0 ?
mjs over 7 years

@koolaang it's the left shit operator, developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…
wdh over 7 years

@momomo Did you mean left shift?
RiZKiT over 7 years

@pensan using hash >>> 0 may result in an error, that the value is too large for an Int32.
RiZKiT over 7 years

@SGr using hash >>> 0 may result in an error, that the value is too large for an Int32.
David over 7 years

At which length it is recommended to use murmur3? I'm planning to decide which hash to use depending on the string length, so it would be fast for short strings and secure for large strings, but I'm not sure if there would be collisions between the two algorithms, is this a good idea?
jcollum over 7 years

This answer has 3 downvotes. For the life of me I can't imagine why. No one has said anything... :-/
AndyO about 7 years

But much, much slower than any of these: https://jsperf.com/hashing-strings
AndyO about 7 years

I also just noticed that the fastest "retro" solution is actually smaller as well if you remove the line feeds so that it's only 3 lines long.
Deekshith about 7 years

@AndyO I dont see this code included in the jsperf link.
AndyO about 7 years

Dude, really? a) It takes only a few minutes to setup your own benchmark with Benchmark.js (which is what I did) b) Do you think I can't see the test case you created here? Which is "94% slower"? jsperf.com/hashing-strings/56
Deekshith about 7 years

Pardon me, I got you wrong. I thought you posted the link which has this benchmark and I just pointed out that I was not able to see that. Thanks for reminding me of the test I created an year ago. Yes this was not the fastest back then but I was going with an assumption that ES6 was new and browser implementations of these new features is not optimized. But here we go even after an year, there is still no significant improvement in the performance. This debunks my assumption (?). I am now guessing that the functional primitives like reduce are inherently slower than a simple for loop.
jpfx1342 about 7 years

@momomo I think he was asking why it was a left shift of zero bits.
mjs about 7 years

@jpfx1342 I realized that later. A quick test without suggests different output without so yes, needed. I believe it converts the stuff in parentheses to 32 bits integer. The 32-bit integer conversion is a quirk of JS.
djabraham almost 7 years

@mathiasrw It's possible for Unicode characters to exceed 8 bits in memory, so I assume the 0xFF simply masks off anything outside that range. See more about charCodeAt() here: developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…
Neil Girardi almost 7 years

Can anyone tell me what the big O for this is? Please include how to pronounce it when reading it out loud (for example, "oh of en log en") Thank you so much!
Jorge Fuentes González almost 7 years

asm.js uses ... | 0 instead of ... << 0. Maybe a quick test can show which performs better. EDIT: Tested. There's no noticeable difference.
Denis Giffeler almost 7 years

I agree. The conversion to hex could be done a little different... var promise = crypto.subtle.digest({name: "SHA-256"}, Uint8Array.from(data)); promise.then(function(result){ console.log(Array.prototype.map.call(new Uint8Array(result), x => x.toString(16).padStart(2, '0')).join('')); });
Gyum Fox over 6 years

The entropy between two consecutive values is very low! "a" is 97 and b is "98"
Dids over 6 years

Any way to have this produce only positive but still unique results?
Codebeat over 6 years

Cannot get this to work in C and Pascal, any ideas how to achieve the same results?
duhaime over 6 years

2147483647 = 2**32/2-1
Sukima over 6 years

@BeetleJuice The more appropriate question is if you have a function that is designed to take a string then why is your program sending it a non-string in the first place? Perhaps the error is a sign that the caller is doing strange things. Food for thought.
Sukima over 6 years

@deekshith The accepted answer uses hash |= 0 to convert to an 32 bit int. This implementation does not. Is this a bug?
Deekshith over 6 years

@Sukima Thanks for pointing out. No idea how I missed it. Fixed it just now :)
abhimanyuaryan almost 6 years

how did you come up with this formula? hash = ((hash << 5) - hash) + chr;. I saw this post which is using something different github.com/darkskyapp/string-hash/blob/master/index.js @esmiralha
abhimanyuaryan almost 6 years

@lordvlad where do you get this formula from? (a<<5)-a)+b.charCodeAt(0)
lordvlad almost 6 years

@AbhimanyuAryan same as the accepted answer, from werxltd.com/wp/2010/05/13/…
lapo almost 6 years

Wow, this is so much better than the usual *31 one for short (or similar) inputs. :)
Maykonn almost 6 years

What means 2147483647?
bryc almost 6 years

A cryptographic hash function for strings is a bit overkill.. crypto isn't exactly performant.
robinnnnn over 5 years

What is the significance of bitshifting to the left 5 times, as opposed a different number?
bryc over 5 years

@robinnnnn Bitshifting to the left 5 times is equivalent to multiplying by 32. It was chosen because it probably resulted in the least amount of collisions during testing. You could get away even with shifting to the left just once, which is multiplying by two. There's nothing special about it. It's not much better than 4 or 6. This isn't even particularly a good hash function. FNV is significantly better, and MurmurHash is even better than FNV.
bryc over 5 years

Using a secure cryptographic hash can be extremely slow. Avoiding collisions is a product of the bit width, not security. 128 bit non-cryptographic hash or even 64 bits should be more than enough for most purposes. MurmurHash3_x86_128 is quite fast and has a very low chance of collisions.
Nijraj Gelani over 5 years

@Maykonn (2^32 - 1)
bryc over 5 years

In response to "FNV-1a has a a better distribution than DJB2, but is slower" - I think it should be said that FNV1a can be extremely fast when implemented using the ES6 Math.imul function. That alone makes it top benchmarks, and ultimately a better choice than DJB2 in the long run.
bryc over 5 years

If ES6 is available (all modern engines support it), Math.imul can be used for the multiplication step, which greatly improves performance. Only issue is, it won't work in IE11 without a shim.
turtlepower over 5 years

Is this a one-way hash? Or it could be reversed?
lordvlad over 5 years

One way. stackoverflow.com/questions/9406775/…
bryc over 5 years

The source code of that lib isn't even readable.. just 50k of minified code.
Oleg Abrazhaev over 5 years

@bryc that's how vendor code supposed to look like :) and for sources you can check github.com/puleos/object-hash/blob/master/index.js
BachT over 5 years

Failed for I.E 11: Object doesn't support property or method 'imul'.
bryc over 5 years

@BachT You can use a polyfill or a full ES6 shim. But IE11 is tragically frozen in 2009 with no updates.
bryc over 5 years

Minified code is 35.4 KB while the full source is 14.2 KB? That makes no sense.
Oleg Abrazhaev over 5 years

@bryc have you considered this line? var crypto = require('crypto');. I think it adds this dependency code from the vendor in the minified version during a build.
Bob Smith about 5 years

The setup and re-setup of the implicit variables and anonymous function is what kills the performance. Played with the JS Perf a minute to see if reduce could be viable, and it does not come close. So, I crunched verbose to a three line version which is on par with verbose performance. jsperf.com FWIW @AbhimanyuAryan This is a 32 bit hash based on a math series s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] (source java.net)
Crypto Batman about 5 years

I'm getting a negative number as a result from this when using JSON.stringify on an object. How can I adjust this to make sure that I always get a positive number for the hash?
Kossi D. T. S. over 4 years

@bryc How do you pick those big numbers, which aren't necessarily prime? How do they effect the efficiency and the performance of the result?
bryc over 4 years

@KossiD.T.S. I didn't pick those multipliers; they were borrowed elsewhere (L'Ecuyer, Knuth, MurmurHash). Usually these numbers are found by clever people via probabilistic testing (e.g. simulated annealing, genetic programming), searching for the best statistical result, tuned for their use case. They don't really affect efficiency, just quality of the hash result. For example, making them even numbers typically breaks everything, but millions of combinations of odd numbers can give decent results. I'm sure even better constants could be found. I never tested this with SMHasher.
bryc over 4 years

@KossiD.T.S. Which explains why I used 9**9 as a multiplier in the TSH hash at the bottom of the answer. It's just a large odd number, 387420489. More scrambling tends to happen, the larger the multiplier is (but 32-bit is the limit of bitwise operations in JS).
Admin over 4 years

Remove if (this.length === 0) return hash; because 0 < 0 == false so for will skip and it return 0 anyway
bryc over 4 years

@jcollum that's why i almost never answer stale questions.. hit and runs go unnoticed. even after you fix up the answer, no one comes along to balance it out.
Polv over 4 years

If you really need to hash Objects, I wrote any-serialize to serialize ANY Object with sort keys, then cyrb53 to generate base36 hash.
stefanmaric over 4 years

Came here just to say that it is indeed way faster using Math.imul: https://jsperf.com/fowler-noll-vo-hash-function-implementati‌ons
Luc about 4 years

Reliable quality random without having to rely on people running tests, built-in (no custom implementation needed), seedable, and I only needed a few hundred numbers for generating a game map, this seemed perfect. But it turns out there is absolutely no way to do it synchronously. Having to provide some async callback thingy every time you call your seeded random engine makes the code super unreadable and look ridiculous. I don't get who came up with this crappy crypto.subtle interface, so I in the end I had to go with xmur3+sfc32 from this answer: stackoverflow.com/a/47593316/1201863
Luc about 4 years

How is this different from the answer by Kaiido two years before yours?
Константин Ван about 4 years

@Luc It is not, apparently.
Shiva Avula about 4 years

Perfecto. Thanks for sharing !
huug about 4 years

I upvoted this answer for the one-liner even if it's a little slower than the accepted answer... thanks!
Basj about 4 years

Warning! this is a bad hashing method, it does not really "hash": hashCode("hello") (99162322) has the same first digits than hashCode("hellp") (99162323); this is typically what we don't want from a hash function...
tibbus about 4 years

That's awesome, exactly what I was looking for, to generate a small hash based on a larger string. I needed to scroll so much for this solution, but was worth it!!!
raisahab almost 4 years

how to use it on a string is my question
mjs almost 4 years

Replace this with the string.
mjs almost 4 years

@raisahab I added the method for those that do not want or know how to prototype to String. Just call the method hashCode("your string")
raisahab almost 4 years

Thanks, i figured it out. I am a C++ developer, so learning on the fly. Thanks once again. @mmm
TechWisdom over 3 years

The link in the question is broken
aomarks over 3 years

@bryc Great answer! Could you please offer an open-source license for your cyrb53 function (e.g. MIT)?
bryc over 3 years

@aomarks sorry, i don't vibe with middle age royal decrees; it's public domain, take that as you will. maybe if MIT offered me free tuition to endorse them ;)
aomarks over 3 years

@bryc Public domain is great! Would you be willing to explicitly grant a public-domain equivalent license for cyrb53? Suggested examples: opensource.org/licenses/0BSD opensource.org/licenses/unlicense The (frustrating) reason for this request is that in some jurisdictions, declaring that something is in the public domain does not legally make it so. So, the closest available tool is to cover it by a license that grants the same usage freedoms as the public domain. See en.wikipedia.org/wiki/Public-domain-equivalent_license. Thanks for considering!
Tomáš Zato over 3 years

I have no idea why "modern" jsdevs think reduce, map and friends are faster. Do you not realize that these method each create a copy of the arrray?
aomarks over 3 years

Hey @byrc! Could you comment on the operations that happen after the loop, and before the 53-bit int construction (h1 = Math.imul(h1 ^ (h1>>>16)... and the next line). I'm curious what the purpose of those operations is, and what would be the downside of omitting those lines and directly constructing the 53-bit int from h1 and h2?
bryc over 3 years

@aomarks Those two lines represent the 'final mixing function', and is required to achieve the Avalanche effect. If you remove it, it becomes a statistically weaker hash. The hash output will be less uniform (less random) in some cases, which has the consequence of slightly more collisions. Red highlighted parts in image show defects occurring when final mix is removed.
aomarks over 3 years

Thanks @byrc! Also, what is the tool you used to do that analysis and generate that image? Looks useful.
bryc over 3 years

@aomarks it's my own tool, a simple test for collisions and distribution. it's super messy but here is an older public version. also see SMHasher, but is harder to use and needs cmake etc. Haven't yet tamed that one ;)
Jon Neal over 3 years

This is awesome, @byrc! Mind some code golfing? If we count down from the string length we’ll save a variable, and if we pre-increment (operators beforehand) we’ll save a copy operation. Tiny theoretical speed improvements 🤓 and 2 characters shorter 😎. TSH=s=>{for(var i=s.length,h=9;i;)h=Math.imul(h^s.charCodeAt(--i),9**9);retu‌rn h^h>>>9}
bryc over 3 years

@JonNeal nice! i didn't think of that, although it has to process in reverse. code golfing is fun; I'm trying to implement minimalistic LZ compression in JS here; github.com/bryc/code/tree/master/jspack#lzjb-2005. Currently only have done LZJB but it's pretty short for what it is, both compress/decompress in under 512 chars.
FooBar over 3 years

How precise is this? I'm using it to difference audio files from eachother
Odys over 3 years

@bryc can't the 9**9, in the last version, be precalculated? seems static
bryc over 3 years

@Odys it can, but the point for that one is to be as short as possible, 9**9 is 4 chars and is the largest odd number achievable with ** notation. 387420489 is 9 chars. I believe modern JS engines optimize that away. At that point, you might as well use a statistically better constant tuned for your use case. Cheers.
Chris over 3 years

One way to obtain non-negative but unique results would be to add 2^31. + Math.pow(2, 31)
tleb over 3 years

Do you actually believe that this version is more maintainable?
bruh over 3 years

@stefanmaric jsperf link is broken, mind sharing a different link?
taseenb about 3 years

Can you explain why you used 31?
amirhe about 3 years

cause it must be (hash << 5) - hash) which means (2^5 x hash - hash) = 32 x hash - hash = 31 x hash
stefanmaric about 3 years

@bruh so JSPerf is in general out of service. I don't have the specific test suite I used at JSPerf but created a new one here that should be similar: jsbench.me/vbknu74f3e
bryc about 3 years

Here's the benchmark I didn't show for my previous comment: jsbench.me/8mjqpwyesk :)
muzuiget about 3 years

It looks like we can move | 0 to the return statement, because Math.imul() will do the convert in each loop.
cYee about 3 years

@Basj why is the warning? 99162322 and 99162323 are still different, isn't it?
cYee about 3 years

Got it, I see what do you mean there. Thanks!
cape_bsas about 3 years

Just tested: Accepted esmiralha answer x10.000 runs: ~380ms. lordVlad's one-liner x10.000 runs: ~160ms. Dizzy's one-liner x10.000 runs: ~1300ms.
Whinger almost 3 years

FWIW, you don't need to |0 if you've used n<<5, because the bitwise op converts to int: you only need to do it at the end. Some simple testing suggests this makes it much faster.
Andrew almost 3 years

str must be a string, seed must be a number, and the output returned from the function is a number. But what are the specs in regards to the str and seed parameters in terms of how short/long or low/high they can be? And can seed be a floating point? Etc.
bryc almost 3 years

@Andrew Updated answer to clarify on seed. It's intended to be a 32-bit unsigned integer. So the values can be 0 to 2^32-1, and with no decimal points. So no floats; JS will just strip fractional part upon the first XOR operation anyway. There's no limit on the length of str. Also, it can easily be used with Array instead of String, but this question is for strings.
ashleedawg over 2 years

Here's the same code, but even [slightly] more compressed (and as a regular function): function hash(s){return s.split("").reduce((a,b)=>(a=(a<<5)-a+b.charCodeAt(0))&a,0)}
MiraTech over 2 years

Which hashing algorithm is being applied for this hash?
bryc over 2 years

@MiraTech Java's built-in String.hashCode. Originally created in 1981 for Gosmacs.
GGizmos over 2 years

Not sure why this happens, but if I use a seed of 12, the function sometimes returns a 13 digit hex (not including '0x') at the front , and sometimes a 14. I am converting "number" to Hex string using ethers.js. Maybe it has something to do with length of the string to be hashed?
bryc over 2 years

@GGizmos Not sure what you mean. What's the exact string/seed values that causes cyrb53 to return 0x13/0x14? This might instead be an ethers.js issue. See if Math.random()*2**53 also causes 0x13/0x14.
GGizmos over 2 years

@bryc: using a seed of 12, and then outputting the string resulting from ethers.utils.hexValue(result), I get the value 0x1e5ec69e41378b (16 chars) from an input value of "a", and the value 0x978a7061b8325 when the input value is "ab"
GGizmos over 2 years

@bryc: actually, a seed of 0 produces a similar result -- with different length of hash of "a" versus "ab". Anyway to modify to always produce a result of the same length?
bryc over 2 years

@GGizmos Try the alt toString(16) return line in the answer which produces a fixed 64-bit hex string; perhaps you can bypass the ethers utility function which I still think is the problem here. I would still need a repro case (EXACT input arguments to reproduce the return of 0x13/0x14) to verify if its an issue with cyrb53. In general, it is extremely unlikely for cyrb53 to return such a small number.
Bojan Krkic over 2 years

Note that this might return negative numbers. Slap a Math.abs on if you want only positive numbers. Note that this will increase chances of collision, but as long as you aren't using it for security...
James Stewart over 2 years

What is the purpose of this part: return a & a? Shouldn't that just return a regardless?
Dan Froberg about 2 years

Dont use + or ^ because it has to be - to make a complement, otherwise information is shifted out. When I changed a char the hash number did not change, so I had to think. With bitwise rotation no information is discarded. I ended up with; h = ((h << 5) | (h >>> 27)) ^ s.charCodeAt(i); The bitwise xor is faster than minus that now can be replaced. The rotation cost an extra right shift of the most significant bits to patch over the zeros that is left shifted in: java2s.com/example/nodejs/number/…
Suchipi about 2 years

StackOverflow indicates this code is licensed CC BY-SA 4.0, which limits its use to stuff licensed under a similar license (sharealike). Would you be willing to relicense it under a more permissive license, like the MIT license?
I0_ol about 2 years

I'm confused about the line result += ('00000000' + view.getUint32(i).toString(16)).slice(-8); What is the point of adding 8 zeroes 00000000 and then slicing them off at the end? .slice(-8)? When I remove those parts I get the exact same result (in my very limited amount of testing). What am I missing?
Kaiido about 2 years

@I0_ol this is the equivalent of x.toString(16).padStart(2, '0') from Denis's comment. .slice(-8) will remove the ones before that aren't used.
I0_ol about 2 years

@Kaiido What I'm saying is that I get the same result without adding 0's or slicing them off. Basically this result += ('00000000' + view.getUint32(i).toString(16)).slice(-8); gives the same result as result += view.getUint32(i).toString(16);
Kaiido about 2 years

@I0_ol because you always got a value higher than 0xFF0000 Simply try with view = new DataView(new Uint32Array([0]).buffer). You'll get "0" without the padding and "00000000" with.
Peter Gerdes about 2 years

@bryc The numbers for these kinds of functions aren't choosen by guessing and checking via things like simulated annealing. Rather, they are based on a mathematical analysis of the underlying group/ring structure. They look really weird, but it's usually conditions like: you want prime or relatively prime values, or values that are near/far powers of small primes or whose differences...you get the idea. Basically, you do some group theory and the way these numbers relate defines your subgroup structure.
bryc about 2 years

@PeterGerdes Would you have some sources/reference material on how these 'constants' are chosen? MurmurHash author found his constants through a simulated annealing algorithm, which is why I mentioned it; unfortunately few authors explain the source of their constants. I personally haven't found prime multipliers to outperform quality-wise any random large odd number. But I'd be interested to know if the algorithm can be improved with different properly chosen constants.
Peter Gerdes about 2 years

Ohh I meant the primes as an example (u usually have to look at the underlying group thy) ...and sorry I meant to say "often are choosen" but I took too long editing my comment. Let me find you an example. Here's the note that 31 in java's hashcode is picked based on being a prime sitepoint.com/how-to-implement-javas-hashcode-correctly and here's a remark about the constant in another hash function being determined by a mathematicial formula stackoverflow.com/questions/24922452/….
Peter Gerdes about 2 years

Ohh here's a better explanation of where the number 2654435761 comes from in the Knuth multiplicative hash comes from (and the computation of it comes from picking a prime but the number isn't literally equal to the prime) stackoverflow.com/questions/11871245/knuth-multiplicative-ha‌sh
bryc about 2 years

@PeterGerdes Thanks. But all that really tells me is that primes are supposed to be magic. What's missing is the data to back up such choices. You mention Graph Theory but I'm not about to study 4yr Uni math just to see its application to this niche. But I somehow doubt a 32-bit fractional approximation of Phi is magically better than millions of other large odd multipliers. Anyway, I keep a table of multipliers I've found around, as comparing/analyzing them could be interesting.
Vlad Churakov about 2 years

Bitwise operators depend on the platform arch. So, on 64bit ones, the result is 64bits, on 32 is 32. In the given answer, hash |= 0; doesn't aways round it down to 32 bit int, in the majority of the cases (as of now, you get 64 bit int). If you need 32 bit int only, please change it to hash &= 0xFFFF;