Generate a Hash from string in Javascript
Solution 1
String.prototype.hashCode = function() {
var hash = 0, i, chr;
if (this.length === 0) return hash;
for (i = 0; i < this.length; i++) {
chr = this.charCodeAt(i);
hash = ((hash << 5) - hash) + chr;
hash |= 0; // Convert to 32bit integer
}
return hash;
};
Source: http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/
Solution 2
Many of the answers here are the same String.hashCode
hash function taken from Java. It dates back to 1981 from Gosling Emacs, is extremely weak, and makes zero sense performance-wise in modern JavaScript. In fact, implementations could be significantly faster by using ES6 Math.imul
, but no one took notice. We can do much better than this, at essentially identical performance.
Here's one I did—cyrb53, a simple but high quality 53-bit hash. It's quite fast, provides very good* hash distribution, and because it outputs 53 bits, has significantly lower collision rates compared to any 32-bit hash. Also, you can ignore SA's CC license as it's public domain on my GitHub.
const cyrb53 = function(str, seed = 0) {
let h1 = 0xdeadbeef ^ seed, h2 = 0x41c6ce57 ^ seed;
for (let i = 0, ch; i < str.length; i++) {
ch = str.charCodeAt(i);
h1 = Math.imul(h1 ^ ch, 2654435761);
h2 = Math.imul(h2 ^ ch, 1597334677);
}
h1 = Math.imul(h1 ^ (h1>>>16), 2246822507) ^ Math.imul(h2 ^ (h2>>>13), 3266489909);
h2 = Math.imul(h2 ^ (h2>>>16), 2246822507) ^ Math.imul(h1 ^ (h1>>>13), 3266489909);
return 4294967296 * (2097151 & h2) + (h1>>>0);
};
*It is roughly similar to the well-known MurmurHash/xxHash algorithms. It uses a combination of multiplication and Xorshift to generate the hash, but not as thorough. As a result it's faster than either would be in JavaScript and significantly simpler to implement, but may not pass all tests in SMHasher. This is not a cryptographic hash function, so don't use this for security purposes.
Like any proper hash, it has an avalanche effect, which basically means small changes in the input have big changes in the output making the resulting hash appear more 'random':
"501c2ba782c97901" = cyrb53("a")
"459eda5bc254d2bf" = cyrb53("b")
"fbce64cc3b748385" = cyrb53("revenge")
"fb1d85148d13f93a" = cyrb53("revenue")
You can optionally supply a seed (unsigned integer, 32-bit max) for alternate streams of the same input:
"76fee5e6598ccd5c" = cyrb53("revenue", 1)
"1f672e2831253862" = cyrb53("revenue", 2)
"2b10de31708e6ab7" = cyrb53("revenue", 3)
Technically, it is a 64-bit hash, that is, two uncorrelated 32-bit hashes computed in parallel, but JavaScript is limited to 53-bit integers. If convenient, the full 64-bit output can be used by altering the return statement with a hex string or array.
return [h2>>>0, h1>>>0];
// or
return (h2>>>0).toString(16).padStart(8,0)+(h1>>>0).toString(16).padStart(8,0);
// or
return 4294967296n * BigInt(h2) + BigInt(h1);
Be aware that constructing hex strings drastically slows down batch processing. The array is much more efficient, but obviously requires two checks instead of one. I also included BigInt
, which should be slightly faster than String
, but still much slower than Array
or Number
.
Just for fun, here's TinySimpleHash, the smallest hash I could come up with that's still decent. It's a 32-bit hash in 89 chars with better quality randomness than even FNV or DJB2:
TSH=s=>{for(var i=0,h=9;i<s.length;)h=Math.imul(h^s.charCodeAt(i++),9**9);return h^h>>>9}
Solution 3
EDIT
based on my jsperf tests, the accepted answer is actually faster: http://jsperf.com/hashcodelordvlad
ORIGINAL
if anyone is interested, here is an improved ( faster ) version, which will fail on older browsers who lack the reduce
array function.
hashCode = function(s){
return s.split("").reduce(function(a,b){a=((a<<5)-a)+b.charCodeAt(0);return a&a},0);
}
one-liner arrow function version :
hashCode = s => s.split('').reduce((a,b)=>{a=((a<<5)-a)+b.charCodeAt(0);return a&a},0)
Solution 4
Note: Even with the best 32-bit hash, collisions will occur sooner or later.
The hash collision probablility can be calculated as
, aproximated as
(see here). This may be higher than intuition suggests:
Assuming a 32-bit hash and k=10,000 items, a collision will occur with a probablility of 1.2%. For 77,163 samples the probability becomes 50%! (calculator).
I suggest a workaround at the bottom.
In an answer to this question
Which hashing algorithm is best for uniqueness and speed?,
Ian Boyd posted a good in depth analysis.
In short (as I interpret it), he comes to the conclusion that Murmur is best, followed by FNV-1a.
Java’s String.hashCode() algorithm that esmiralha proposed seems to be a variant of DJB2.
- FNV-1a has a a better distribution than DJB2, but is slower
- DJB2 is faster than FNV-1a, but tends to yield more collisions
- MurmurHash3 is better and faster than DJB2 and FNV-1a (but the optimized implementation requires more lines of code than FNV and DJB2)
Some benchmarks with large input strings here: http://jsperf.com/32-bit-hash
When short input strings are hashed, murmur's performance drops, relative to DJ2B and FNV-1a: http://jsperf.com/32-bit-hash/3
So in general I would recommend murmur3.
See here for a JavaScript implementation:
https://github.com/garycourt/murmurhash-js
If input strings are short and performance is more important than distribution quality, use DJB2 (as proposed by the accepted answer by esmiralha).
If quality and small code size are more important than speed, I use this implementation of FNV-1a (based on this code).
/**
* Calculate a 32 bit FNV-1a hash
* Found here: https://gist.github.com/vaiorabbit/5657561
* Ref.: http://isthe.com/chongo/tech/comp/fnv/
*
* @param {string} str the input value
* @param {boolean} [asString=false] set to true to return the hash value as
* 8-digit hex string instead of an integer
* @param {integer} [seed] optionally pass the hash of the previous chunk
* @returns {integer | string}
*/
function hashFnv32a(str, asString, seed) {
/*jshint bitwise:false */
var i, l,
hval = (seed === undefined) ? 0x811c9dc5 : seed;
for (i = 0, l = str.length; i < l; i++) {
hval ^= str.charCodeAt(i);
hval += (hval << 1) + (hval << 4) + (hval << 7) + (hval << 8) + (hval << 24);
}
if( asString ){
// Convert to 8 digit hex string
return ("0000000" + (hval >>> 0).toString(16)).substr(-8);
}
return hval >>> 0;
}
Improve Collision Probability
As explained here, we can extend the hash bit size using this trick:
function hash64(str) {
var h1 = hash32(str); // returns 32 bit (as 8 byte hex string)
return h1 + hash32(h1 + str); // 64 bit (as 16 byte hex string)
}
Use it with care and don't expect too much though.
Solution 5
Based on accepted answer in ES6. Smaller, maintainable and works in modern browsers.
function hashCode(str) {
return str.split('').reduce((prevHash, currVal) =>
(((prevHash << 5) - prevHash) + currVal.charCodeAt(0))|0, 0);
}
// Test
console.log("hashCode(\"Hello!\"): ", hashCode('Hello!'));
EDIT (2019-11-04):
one-liner arrow function version :
const hashCode = s => s.split('').reduce((a,b) => (((a << 5) - a) + b.charCodeAt(0))|0, 0)
// test
console.log(hashCode('Hello!'))
Freesnöw
Updated on May 13, 2022Comments
-
Freesnöw about 2 years
I need to convert strings to some form of hash. Is this possible in JavaScript?
I'm not utilizing a server-side language so I can't do it that way.
-
rlemon almost 13 yearsYou want to look up something like Javascript md5.
-
henrikstroem almost 11 yearsMD5 is not secure, so don't look for that one.
-
Brad Koch about 10 years@henrikstroem Depends on what you are hashing; there's nothing wrong with using md5 to make a hash for non-security purposes.
-
dav over 9 yearsHere is a js md5 function github.com/blueimp/JavaScript-MD5 and here is the demo blueimp.github.io/JavaScript-MD5 works great for me.
-
Paul Ferrett over 9 years@BradKoch Depends on what you are doing; there's nothing wrong for using md5 for security purposes. There are certainly better methods for hashing passwords, but md5 is just fine for doing things like signing a URL.
-
domen over 8 yearsI find it funny that while MD5 is criticised in comments here, almost all answers recommend much worse hash algorithms and get lots of upvotes.
-
Orwellophile over 7 yearsAn implementation of Jenkins's one-at-a-time hash
window.hashJoaat=function(b){for(var a=0,c=b.length;c--;)a+=b.charCodeAt(c),a+=a<<10,a^=a>>6;a+=a<<3;a^=a>>11;return((a+(a<<15)&4294967295)>>>0).toString(16)};
-
iPherian about 7 years@henrikstroem Most hash tables use far less secure algorithms. Because they need different properties from the algorithm.
-
henrikstroem about 7 years@iPherian Yes, that's true - good point. The reason for my comment is that the question didn't put the usage in context, and there is usually no reason for using MD5 over better algorithms.
-
James M. Lay over 6 yearsUsing MD5 to verify that a download came intact is not magically going to email your passwords to all your co-workers.
-
Martin Capodici over 5 years@henrikstroem I came here looking for a hash to pick a colour
-
dantechguy over 3 yearsFor a built-in real hash solution see Kaiido's answer
-
Enrique René over 3 yearsI came here to create unique values for
key
property of ReactJS elements. Some objects don't provide an unique ID (or key) property when inside an array. ReactJS docs suggest avoid make use of index loop because of some unstability of this method. I think a hash, even non-secure one, is useful in this case. -
Aron almost 3 years@domen MD5 isn't bad because it is insecure. It is bad because it is both insecure, AND slow. There is no scenario where MD5 could not be replaced with something better suited for the purpose.
-
Freesnöw almost 3 years@Aron could you give some examples? You're replying to a comment that's half a decade old.
-
Andrew almost 3 yearsNote that if you do not need to hash a string specifically but just need a hash, you can use:
window.crypto.getRandomValues(new Uint32Array(1))[0]
, or, for 8 hex character strings:window.crypto.getRandomValues(new Uint32Array(1))[0].toString(16)
-
mtraceur almost 3 years@JamesM.Lay But it could "magically" email your passwords to all your coworkers, indirectly, if MD5 is broken badly enough that someone could craft a malicious replacement for your download that still hashes to the same MD5 hash, for example by appending some malicious data followed by padding to a PDF, which your PDF reader mishandles in a way that enables a sufficiently flexible remote code execution exploit or whatever.
-
James M. Lay over 2 years@mtraceur I can't believe I said that three years ago. I agree fully with your point, MD5 should not be used to verify downloads in an untrusted context. I think the point I'd try to drive home now is different - use MD5 for its speed, not for its security. Using MD5 to invalidate a cache or implement a hashmap of sorts, that's probably not going to result in malicious emails.
-
James M. Lay over 2 years@Aron You're probably right. But - MD5 is advantageous in that it is the fastest well-adopted algorithm. You can find it on basically any platform.
-
-
corsiKa almost 13 yearsThis is the same one used in Java. The
hash << 5 - hash
is the same ashash * 31 + char
but a LOT faster. It's nice because it's so fast, and 31 is a small prime. Win win there. -
Dan almost 13 yearsVery lightweight to add on the page too. I like this. Just wondering what resistance to reversing the hashes this has? sequences of numbers for example result in predictable hashes sometimes.
-
Jelle De Loecker about 12 yearsI did a few tests on jsperf (jsperf.com/hashing-strings) and the bitwise function is actually more slow than the number based function.
-
Noel Abrahams almost 12 yearsThanks, but you have leaked 2 variables into the global namespace.
-
Stphane over 11 yearsThis is nice ... I need to generate a hash value for images url to implement some lazy-loading functionality ... This would be perfect if it did generate unsigned values only
:]
Any idea to counter this issue ? -
Peter Aron Zentai over 11 yearsnumber based is unusable for larger strings.
-
hdgarrood over 11 years
hash = hash & hash
-- the link seems to be saying thathashCode
needs to return a 32 bit integer. surely in that case a better approach would behash = hash & 0xffffffff
?Math.pow(2, 33)
bitwise and itself in Chrome returns 0 for me -
None about 11 years@hdgarrood --
Math.pow( 2, 33 ) & <anything> === 0
because the&
op does an implicit cast to a 32-bit int and the last 32 bits of that are 0, so the 1 at bit 34 gets thrown away. Anything using the&
op will be exactly the same, including& 0xffffffff
-
hdgarrood about 11 years@cwolves I understand! Thank you. Am I right in thinking that
hash |= 0
should converthash
to a 32-bit int by keeping the least significant 32 bits? -
TachyonVortex almost 11 years@hdgarrood Correct. An example:
hash = 7264728162427
(which is 69B738AA07B in hex).hash | 0
will chop off everything above 32 bits (in this case, 69B) to leave 738AA07B, which is 1938464891 in decimal. So you'll find that(hash | 0) === 1938464891
. -
TachyonVortex almost 11 years@PeterAronZentai Why is it "unusable"? The output produced by the number-based code
(hash * 31) + char
is identical to the output produced by the shift-based code((hash<<5)-hash)+char
, even for very long strings (I've tested it with strings containing over a million characters), so it's not "unusable" in terms of accuracy. The complexity is O(n) for both the number-based and shift-based versions, so it's not "unusable" in terms of complexity. -
Prosto Trader over 10 yearsis there a way to get hash wich is positive number only?
-
Dave Cameron over 10 years@lordvlad May I use this in an open source project? I think I copied it earlier, as it uses return a|=0 rather than return a&a.
-
lordvlad over 10 yearsYes, please! Always a pleasure to be of use :) just let me know what it is
-
Dave Cameron over 10 years@lordvlad I've used it in some modifications I am making to the acra-storage project. It is a backend for android crash reporting.
-
BrunoLM over 10 yearsFaster comparing to...?
-
SGr over 10 years@ProstoTrader Add >>>0 at the end. This function is 10 times faster than the accepted answer on both Firefox and Chrome for me.
-
lordvlad over 10 yearsweird. i just tested it and it turned out to be waaay slower than the accepted answer. jsperf.com/hashcodelordvlad
-
mikemaccana over 10 yearsGood guy @lordvlad, actually testing his own answer, and then reporting when it was slower.
-
Don McCurdy about 10 yearsCan anyone comment on the uniqueness (or not) of the output? Specifically, if I only use this hash for strings with length less than
n
, what is the largestn
for which I can't possibly have a collision? -
majidarif almost 10 yearsHow would this be if I needed something that is unsigned?
-
rattray almost 10 yearsIs there any reason this needs (or should) be on the String prototype? Would it be any less effective/efficient to just have eg;
var hashCode = function hashCode (str) {etc...}
? And then use ashashCode("mystring")
? -
ericsoco almost 10 years@rattray I assume because it was adapted from werxltd.com/wp/2010/05/13/….
-
ericsoco almost 10 years@skerit your Number-based perf test isn't correctly implementing the algorithm as described here: docs.oracle.com/javase/7/docs/api/java/lang/…. Here's an update: jsperf.com/hashing-strings/38 Number-based is significantly slower than bitwise.
-
svub over 9 yearsagree with @PeterAronZentai: implemented the method due to it's great performance but for long strings you'll simply get "Infinity"
-
Manuel Meurer over 9 yearsWhy do you do
("0000000" + (hval >>> 0).toString(16)).substr(-8);
? Isn't that the same as(hval >>> 0).toString(16)
? -
mar10 over 9 yearsthis add leading '0's so that the resulting hash is always 8 chars long. Easier to read and recognize in outputs, but that's my personal opinion
-
Manuel Meurer over 9 yearsAh ok, I get it. For small
hval
,(hval >>> 0).toString(16)
might be less than 8 characters, so you pad it with zeros. I was just confused because(hval >>> 0).toString(16)
always resulted in a exactly 8 character string for me. -
thdoan over 9 yearsHere's a performance comparison of three common string-to-integer hash functions: jsperf.com/string-hashing-methods
-
thdoan over 9 yearsI modified hashCode() to make it a little faster: jsperf.com/string-hashing-methods (str2hash)
-
lordvlad over 9 yearsI just realized: It makes perfect sense that the accepted answer is faster, because my version has to turn the string into an array first, allocating new memory and copying every character...
-
Rafael Xavier about 9 yearsES5 version of your answer is
[].reduce.call( str, function( hash, i ) { var chr = i.charCodeAt(0); hash = ((hash << 5) - hash) + chr; return hash | 0; }, 0 );
. -
djabraham about 9 yearsSweet! I was struggling to find out how to XOR (^) in JS because it was returning negative numbers. This thing you did with hval did the trick. Now I have an algorithm that returns the same hash in C# as in JS. I will post them both below. Thanks!
-
jcollum about 9 years@t0r0X well now I use a module called shortid: npmjs.com/package/shortid
-
Musakkhir Sayyed about 9 yearshow to optimize this code to run faster in every browser.
String.prototype.hashCode = function(){ var hash = 5381; if (this.length === 0) return hash; for (var i = 0; i < this.length; i++) { var character = this.charCodeAt(i); hash = ((hash<<5)+hash)^character; // Convert to 32bit integer } return hash; }
-
user40171 almost 9 yearsI wonder if removing the line
if (this.length == 0) return hash;
will have a major impact on performance. In terms of clean code, in my opinion, it is just noise. -
Adaptabi over 8 yearssince performance is a concern, why not O(n) but from the end to the start (making the for to reverse from the last char to the first char) - since there will be less comparisons
i<len
but `i--' -
Scott Means over 8 yearsFWIW I don't think the
hash |= 0
expression is doing what's expected. I have an application where I need to generate matching hashes in both JavaScript and PHP, and so I ported this function to PHP. The hashes from the two functions didn't match, but when I explicitly kept the lower 32 bits (hash &= 0xffff
) everything matched up. -
Deekshith over 8 yearsCheck this for the same function implemented in ES6: stackoverflow.com/a/34842797/3439460
-
Dizzy over 8 years[].reduce.call(str, (p, c, i, a) => (p << 5) - p + a.charCodeAt(i), 0);
-
OZZIE about 8 yearsStupid question maybe, but how do you use this function? :D
-
Israel about 8 years@OZZIE var hashCode = yourString.hashCode();
-
BeetleJuice almost 8 yearsThanks for sharing I added
str += ""
before hashing to avoid exceptionstr.split is not a function
thrown when non-strings were passed as parameters -
Benji XVI almost 8 years@ScottMeans You wouldn't expect
|= 0
always to have the same result in JS and PHP – in JS it implicitly converts the first operand to a 32-bit integer – not the case in PHP. i.e. The 32-bit integer conversion is a quirk of the JS|
operator. -
cyberwombat almost 8 yearsHow are you using the username with shortid? It just seems to generate ids but I don't see how you are using to generate a hash from a string
-
GavinoGrifoni over 7 yearsI love this answer because it produces a sooo much better distributed hash: other functions proposed here will make consequent hash values. Eg `hash("example1") - hash("example2") == 1", while this one is much more unpredictable.
-
pensan over 7 yearscreate port, thank you. To ensure an always positive numbers (because I don't want negative ones) I changed
return hash;
toreturn (hash >>> 0);
-
Scott Means over 7 years@BenjiXVI In retrospect, I don't know why I would have expected it to coerce the value to 32 bits in PHP. Not sure why the author didn't just &= in the first place.
-
koolaang over 7 yearsgreat answer, but what is the purpose of << 0 ?
-
mjs over 7 years@koolaang it's the left shit operator, developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…
-
wdh over 7 years@momomo Did you mean left shift?
-
RiZKiT over 7 years@pensan using
hash >>> 0
may result in an error, that the value is too large for an Int32. -
RiZKiT over 7 years@SGr using
hash >>> 0
may result in an error, that the value is too large for an Int32. -
David over 7 yearsAt which length it is recommended to use murmur3? I'm planning to decide which hash to use depending on the string length, so it would be fast for short strings and secure for large strings, but I'm not sure if there would be collisions between the two algorithms, is this a good idea?
-
jcollum over 7 yearsThis answer has 3 downvotes. For the life of me I can't imagine why. No one has said anything... :-/
-
AndyO about 7 yearsBut much, much slower than any of these: https://jsperf.com/hashing-strings
-
AndyO about 7 yearsI also just noticed that the fastest "retro" solution is actually smaller as well if you remove the line feeds so that it's only 3 lines long.
-
Deekshith about 7 years@AndyO I dont see this code included in the jsperf link.
-
AndyO about 7 yearsDude, really? a) It takes only a few minutes to setup your own benchmark with Benchmark.js (which is what I did) b) Do you think I can't see the test case you created here? Which is "94% slower"? jsperf.com/hashing-strings/56
-
Deekshith about 7 yearsPardon me, I got you wrong. I thought you posted the link which has this benchmark and I just pointed out that I was not able to see that. Thanks for reminding me of the test I created an year ago. Yes this was not the fastest back then but I was going with an assumption that ES6 was new and browser implementations of these new features is not optimized. But here we go even after an year, there is still no significant improvement in the performance. This debunks my assumption (?). I am now guessing that the functional primitives like
reduce
are inherently slower than a simplefor
loop. -
jpfx1342 about 7 years@momomo I think he was asking why it was a left shift of zero bits.
-
mjs about 7 years@jpfx1342 I realized that later. A quick test without suggests different output without so yes, needed. I believe it converts the stuff in parentheses to 32 bits integer. The 32-bit integer conversion is a quirk of JS.
-
djabraham almost 7 years@mathiasrw It's possible for Unicode characters to exceed 8 bits in memory, so I assume the 0xFF simply masks off anything outside that range. See more about charCodeAt() here: developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…
-
Neil Girardi almost 7 yearsCan anyone tell me what the big O for this is? Please include how to pronounce it when reading it out loud (for example, "oh of en log en") Thank you so much!
-
Jorge Fuentes González almost 7 years
asm.js
uses... | 0
instead of... << 0
. Maybe a quick test can show which performs better. EDIT: Tested. There's no noticeable difference. -
Denis Giffeler almost 7 yearsI agree. The conversion to hex could be done a little different...
var promise = crypto.subtle.digest({name: "SHA-256"}, Uint8Array.from(data)); promise.then(function(result){ console.log(Array.prototype.map.call(new Uint8Array(result), x => x.toString(16).padStart(2, '0')).join('')); });
-
Gyum Fox over 6 yearsThe entropy between two consecutive values is very low! "a" is 97 and b is "98"
-
Dids over 6 yearsAny way to have this produce only positive but still unique results?
-
Codebeat over 6 yearsCannot get this to work in C and Pascal, any ideas how to achieve the same results?
-
duhaime over 6 years2147483647 = 2**32/2-1
-
Sukima over 6 years@BeetleJuice The more appropriate question is if you have a function that is designed to take a string then why is your program sending it a non-string in the first place? Perhaps the error is a sign that the caller is doing strange things. Food for thought.
-
Sukima over 6 years@deekshith The accepted answer uses
hash |= 0
to convert to an 32 bit int. This implementation does not. Is this a bug? -
Deekshith over 6 years@Sukima Thanks for pointing out. No idea how I missed it. Fixed it just now :)
-
abhimanyuaryan almost 6 yearshow did you come up with this formula?
hash = ((hash << 5) - hash) + chr;
. I saw this post which is using something different github.com/darkskyapp/string-hash/blob/master/index.js @esmiralha -
abhimanyuaryan almost 6 years@lordvlad where do you get this formula from? (a<<5)-a)+b.charCodeAt(0)
-
lordvlad almost 6 years@AbhimanyuAryan same as the accepted answer, from werxltd.com/wp/2010/05/13/…
-
lapo almost 6 yearsWow, this is so much better than the usual *31 one for short (or similar) inputs. :)
-
Maykonn almost 6 yearsWhat means 2147483647?
-
bryc almost 6 yearsA cryptographic hash function for strings is a bit overkill..
crypto
isn't exactly performant. -
robinnnnn over 5 yearsWhat is the significance of bitshifting to the left 5 times, as opposed a different number?
-
bryc over 5 years@robinnnnn Bitshifting to the left 5 times is equivalent to multiplying by 32. It was chosen because it probably resulted in the least amount of collisions during testing. You could get away even with shifting to the left just once, which is multiplying by two. There's nothing special about it. It's not much better than 4 or 6. This isn't even particularly a good hash function. FNV is significantly better, and MurmurHash is even better than FNV.
-
bryc over 5 yearsUsing a secure cryptographic hash can be extremely slow. Avoiding collisions is a product of the bit width, not security. 128 bit non-cryptographic hash or even 64 bits should be more than enough for most purposes. MurmurHash3_x86_128 is quite fast and has a very low chance of collisions.
-
Nijraj Gelani over 5 years@Maykonn (2^32 - 1)
-
bryc over 5 yearsIn response to "FNV-1a has a a better distribution than DJB2, but is slower" - I think it should be said that FNV1a can be extremely fast when implemented using the ES6
Math.imul
function. That alone makes it top benchmarks, and ultimately a better choice than DJB2 in the long run. -
bryc over 5 yearsIf ES6 is available (all modern engines support it),
Math.imul
can be used for the multiplication step, which greatly improves performance. Only issue is, it won't work in IE11 without a shim. -
turtlepower over 5 yearsIs this a one-way hash? Or it could be reversed?
-
lordvlad over 5 years
-
bryc over 5 yearsThe source code of that lib isn't even readable.. just 50k of minified code.
-
Oleg Abrazhaev over 5 years@bryc that's how vendor code supposed to look like :) and for sources you can check github.com/puleos/object-hash/blob/master/index.js
-
BachT over 5 yearsFailed for I.E 11: Object doesn't support property or method
'imul'
. -
bryc over 5 years
-
bryc over 5 yearsMinified code is 35.4 KB while the full source is 14.2 KB? That makes no sense.
-
Oleg Abrazhaev over 5 years@bryc have you considered this line?
var crypto = require('crypto');
. I think it adds this dependency code from the vendor in the minified version during a build. -
Bob Smith about 5 yearsThe setup and re-setup of the implicit variables and anonymous function is what kills the performance. Played with the JS Perf a minute to see if
reduce
could be viable, and it does not come close. So, I crunchedverbose
to a three line version which is on par with verbose performance. jsperf.com FWIW @AbhimanyuAryan This is a 32 bit hash based on a math series s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] (source java.net) -
Crypto Batman about 5 yearsI'm getting a negative number as a result from this when using JSON.stringify on an object. How can I adjust this to make sure that I always get a positive number for the hash?
-
Kossi D. T. S. over 4 years@bryc How do you pick those big numbers, which aren't necessarily prime? How do they effect the efficiency and the performance of the result?
-
bryc over 4 years@KossiD.T.S. I didn't pick those multipliers; they were borrowed elsewhere (L'Ecuyer, Knuth, MurmurHash). Usually these numbers are found by clever people via probabilistic testing (e.g. simulated annealing, genetic programming), searching for the best statistical result, tuned for their use case. They don't really affect efficiency, just quality of the hash result. For example, making them even numbers typically breaks everything, but millions of combinations of odd numbers can give decent results. I'm sure even better constants could be found. I never tested this with SMHasher.
-
bryc over 4 years@KossiD.T.S. Which explains why I used
9**9
as a multiplier in the TSH hash at the bottom of the answer. It's just a large odd number, 387420489. More scrambling tends to happen, the larger the multiplier is (but 32-bit is the limit of bitwise operations in JS). -
Admin over 4 yearsRemove
if (this.length === 0) return hash;
because0 < 0 == false
sofor
will skip and it return0
anyway -
bryc over 4 years@jcollum that's why i almost never answer stale questions.. hit and runs go unnoticed. even after you fix up the answer, no one comes along to balance it out.
-
Polv over 4 yearsIf you really need to hash Objects, I wrote any-serialize to serialize ANY Object with sort keys, then cyrb53 to generate base36 hash.
-
stefanmaric over 4 yearsCame here just to say that it is indeed way faster using
Math.imul
: https://jsperf.com/fowler-noll-vo-hash-function-implementations -
Luc about 4 yearsReliable quality random without having to rely on people running tests, built-in (no custom implementation needed), seedable, and I only needed a few hundred numbers for generating a game map, this seemed perfect. But it turns out there is absolutely no way to do it synchronously. Having to provide some async callback thingy every time you call your seeded random engine makes the code super unreadable and look ridiculous. I don't get who came up with this crappy crypto.subtle interface, so I in the end I had to go with xmur3+sfc32 from this answer: stackoverflow.com/a/47593316/1201863
-
Luc about 4 yearsHow is this different from the answer by Kaiido two years before yours?
-
Константин Ван about 4 years@Luc It is not, apparently.
-
Shiva Avula about 4 yearsPerfecto. Thanks for sharing !
-
huug about 4 yearsI upvoted this answer for the one-liner even if it's a little slower than the accepted answer... thanks!
-
Basj about 4 yearsWarning! this is a bad hashing method, it does not really "hash":
hashCode("hello")
(99162322) has the same first digits thanhashCode("hellp")
(99162323); this is typically what we don't want from a hash function... -
tibbus about 4 yearsThat's awesome, exactly what I was looking for, to generate a small hash based on a larger string. I needed to scroll so much for this solution, but was worth it!!!
-
raisahab almost 4 yearshow to use it on a string is my question
-
mjs almost 4 yearsReplace this with the string.
-
mjs almost 4 years@raisahab I added the method for those that do not want or know how to prototype to String. Just call the method hashCode("your string")
-
raisahab almost 4 yearsThanks, i figured it out. I am a C++ developer, so learning on the fly. Thanks once again. @mmm
-
TechWisdom over 3 yearsThe link in the question is broken
-
aomarks over 3 years@bryc Great answer! Could you please offer an open-source license for your
cyrb53
function (e.g. MIT)? -
bryc over 3 years@aomarks sorry, i don't vibe with middle age royal decrees; it's public domain, take that as you will. maybe if MIT offered me free tuition to endorse them ;)
-
aomarks over 3 years@bryc Public domain is great! Would you be willing to explicitly grant a public-domain equivalent license for
cyrb53
? Suggested examples: opensource.org/licenses/0BSD opensource.org/licenses/unlicense The (frustrating) reason for this request is that in some jurisdictions, declaring that something is in the public domain does not legally make it so. So, the closest available tool is to cover it by a license that grants the same usage freedoms as the public domain. See en.wikipedia.org/wiki/Public-domain-equivalent_license. Thanks for considering! -
Tomáš Zato over 3 yearsI have no idea why "modern" jsdevs think reduce, map and friends are faster. Do you not realize that these method each create a copy of the arrray?
-
aomarks over 3 yearsHey @byrc! Could you comment on the operations that happen after the loop, and before the 53-bit int construction (
h1 = Math.imul(h1 ^ (h1>>>16)...
and the next line). I'm curious what the purpose of those operations is, and what would be the downside of omitting those lines and directly constructing the 53-bit int from h1 and h2? -
bryc over 3 years@aomarks Those two lines represent the 'final mixing function', and is required to achieve the Avalanche effect. If you remove it, it becomes a statistically weaker hash. The hash output will be less uniform (less random) in some cases, which has the consequence of slightly more collisions. Red highlighted parts in image show defects occurring when final mix is removed.
-
aomarks over 3 yearsThanks @byrc! Also, what is the tool you used to do that analysis and generate that image? Looks useful.
-
bryc over 3 years
-
Jon Neal over 3 yearsThis is awesome, @byrc! Mind some code golfing? If we count down from the string length we’ll save a variable, and if we pre-increment (operators beforehand) we’ll save a copy operation. Tiny theoretical speed improvements 🤓 and 2 characters shorter 😎.
TSH=s=>{for(var i=s.length,h=9;i;)h=Math.imul(h^s.charCodeAt(--i),9**9);return h^h>>>9}
-
bryc over 3 years@JonNeal nice! i didn't think of that, although it has to process in reverse. code golfing is fun; I'm trying to implement minimalistic LZ compression in JS here; github.com/bryc/code/tree/master/jspack#lzjb-2005. Currently only have done LZJB but it's pretty short for what it is, both compress/decompress in under 512 chars.
-
FooBar over 3 yearsHow precise is this? I'm using it to difference audio files from eachother
-
Odys over 3 years@bryc can't the 9**9, in the last version, be precalculated? seems static
-
bryc over 3 years@Odys it can, but the point for that one is to be as short as possible,
9**9
is 4 chars and is the largest odd number achievable with ** notation.387420489
is 9 chars. I believe modern JS engines optimize that away. At that point, you might as well use a statistically better constant tuned for your use case. Cheers. -
Chris over 3 yearsOne way to obtain non-negative but unique results would be to add 2^31.
+ Math.pow(2, 31)
-
tleb over 3 yearsDo you actually believe that this version is more maintainable?
-
bruh over 3 years@stefanmaric jsperf link is broken, mind sharing a different link?
-
taseenb about 3 yearsCan you explain why you used 31?
-
amirhe about 3 yearscause it must be
(hash << 5) - hash)
which means(2^5 x hash - hash) = 32 x hash - hash = 31 x hash
-
stefanmaric about 3 years@bruh so JSPerf is in general out of service. I don't have the specific test suite I used at JSPerf but created a new one here that should be similar: jsbench.me/vbknu74f3e
-
bryc about 3 yearsHere's the benchmark I didn't show for my previous comment: jsbench.me/8mjqpwyesk :)
-
muzuiget about 3 yearsIt looks like we can move
| 0
to the return statement, becauseMath.imul()
will do the convert in each loop. -
cYee about 3 years@Basj why is the warning?
99162322
and99162323
are still different, isn't it? -
cYee about 3 yearsGot it, I see what do you mean there. Thanks!
-
cape_bsas about 3 yearsJust tested: Accepted esmiralha answer x10.000 runs: ~380ms. lordVlad's one-liner x10.000 runs: ~160ms. Dizzy's one-liner x10.000 runs: ~1300ms.
-
Whinger almost 3 yearsFWIW, you don't need to |0 if you've used n<<5, because the bitwise op converts to int: you only need to do it at the end. Some simple testing suggests this makes it much faster.
-
Andrew almost 3 years
str
must be a string,seed
must be a number, and the output returned from the function is a number. But what are the specs in regards to thestr
andseed
parameters in terms of how short/long or low/high they can be? And can seed be a floating point? Etc. -
bryc almost 3 years@Andrew Updated answer to clarify on
seed
. It's intended to be a 32-bit unsigned integer. So the values can be 0 to 2^32-1, and with no decimal points. So no floats; JS will just strip fractional part upon the first XOR operation anyway. There's no limit on the length ofstr
. Also, it can easily be used with Array instead of String, but this question is for strings. -
ashleedawg over 2 yearsHere's the same code, but even [slightly] more compressed (and as a regular function):
function hash(s){return s.split("").reduce((a,b)=>(a=(a<<5)-a+b.charCodeAt(0))&a,0)}
-
MiraTech over 2 yearsWhich hashing algorithm is being applied for this hash?
-
bryc over 2 years@MiraTech Java's built-in
String.hashCode
. Originally created in 1981 for Gosmacs. -
GGizmos over 2 yearsNot sure why this happens, but if I use a seed of 12, the function sometimes returns a 13 digit hex (not including '0x') at the front , and sometimes a 14. I am converting "number" to Hex string using ethers.js. Maybe it has something to do with length of the string to be hashed?
-
bryc over 2 years@GGizmos Not sure what you mean. What's the exact
string
/seed
values that causescyrb53
to return 0x13/0x14? This might instead be an ethers.js issue. See ifMath.random()*2**53
also causes 0x13/0x14. -
GGizmos over 2 years@bryc: using a seed of 12, and then outputting the string resulting from
ethers.utils.hexValue(result)
, I get the value0x1e5ec69e41378b
(16 chars) from an input value of "a", and the value0x978a7061b8325
when the input value is "ab" -
GGizmos over 2 years@bryc: actually, a seed of 0 produces a similar result -- with different length of hash of "a" versus "ab". Anyway to modify to always produce a result of the same length?
-
bryc over 2 years@GGizmos Try the alt
toString(16)
return line in the answer which produces a fixed 64-bit hex string; perhaps you can bypass the ethers utility function which I still think is the problem here. I would still need a repro case (EXACT input arguments to reproduce the return of 0x13/0x14) to verify if its an issue with cyrb53. In general, it is extremely unlikely for cyrb53 to return such a small number. -
Bojan Krkic over 2 yearsNote that this might return negative numbers. Slap a
Math.abs
on if you want only positive numbers. Note that this will increase chances of collision, but as long as you aren't using it for security... -
James Stewart over 2 yearsWhat is the purpose of this part:
return a & a
? Shouldn't that just return a regardless? -
Dan Froberg about 2 yearsDont use + or ^ because it has to be - to make a complement, otherwise information is shifted out. When I changed a char the hash number did not change, so I had to think. With bitwise rotation no information is discarded. I ended up with; h = ((h << 5) | (h >>> 27)) ^ s.charCodeAt(i); The bitwise xor is faster than minus that now can be replaced. The rotation cost an extra right shift of the most significant bits to patch over the zeros that is left shifted in: java2s.com/example/nodejs/number/…
-
Suchipi about 2 yearsStackOverflow indicates this code is licensed CC BY-SA 4.0, which limits its use to stuff licensed under a similar license (sharealike). Would you be willing to relicense it under a more permissive license, like the MIT license?
-
I0_ol about 2 yearsI'm confused about the line
result += ('00000000' + view.getUint32(i).toString(16)).slice(-8);
What is the point of adding 8 zeroes00000000
and then slicing them off at the end?.slice(-8)
? When I remove those parts I get the exact same result (in my very limited amount of testing). What am I missing? -
Kaiido about 2 years@I0_ol this is the equivalent of
x.toString(16).padStart(2, '0')
from Denis's comment..slice(-8)
will remove the ones before that aren't used. -
I0_ol about 2 years@Kaiido What I'm saying is that I get the same result without adding 0's or slicing them off. Basically this
result += ('00000000' + view.getUint32(i).toString(16)).slice(-8);
gives the same result asresult += view.getUint32(i).toString(16);
-
Kaiido about 2 years@I0_ol because you always got a value higher than 0xFF0000 Simply try with
view = new DataView(new Uint32Array([0]).buffer)
. You'll get"0"
without the padding and"00000000"
with. -
Peter Gerdes about 2 years@bryc The numbers for these kinds of functions aren't choosen by guessing and checking via things like simulated annealing. Rather, they are based on a mathematical analysis of the underlying group/ring structure. They look really weird, but it's usually conditions like: you want prime or relatively prime values, or values that are near/far powers of small primes or whose differences...you get the idea. Basically, you do some group theory and the way these numbers relate defines your subgroup structure.
-
bryc about 2 years@PeterGerdes Would you have some sources/reference material on how these 'constants' are chosen? MurmurHash author found his constants through a simulated annealing algorithm, which is why I mentioned it; unfortunately few authors explain the source of their constants. I personally haven't found prime multipliers to outperform quality-wise any random large odd number. But I'd be interested to know if the algorithm can be improved with different properly chosen constants.
-
Peter Gerdes about 2 yearsOhh I meant the primes as an example (u usually have to look at the underlying group thy) ...and sorry I meant to say "often are choosen" but I took too long editing my comment. Let me find you an example. Here's the note that 31 in java's hashcode is picked based on being a prime sitepoint.com/how-to-implement-javas-hashcode-correctly and here's a remark about the constant in another hash function being determined by a mathematicial formula stackoverflow.com/questions/24922452/….
-
Peter Gerdes about 2 yearsOhh here's a better explanation of where the number 2654435761 comes from in the Knuth multiplicative hash comes from (and the computation of it comes from picking a prime but the number isn't literally equal to the prime) stackoverflow.com/questions/11871245/knuth-multiplicative-hash
-
bryc about 2 years@PeterGerdes Thanks. But all that really tells me is that primes are supposed to be magic. What's missing is the data to back up such choices. You mention Graph Theory but I'm not about to study 4yr Uni math just to see its application to this niche. But I somehow doubt a 32-bit fractional approximation of Phi is magically better than millions of other large odd multipliers. Anyway, I keep a table of multipliers I've found around, as comparing/analyzing them could be interesting.
-
Vlad Churakov about 2 yearsBitwise operators depend on the platform arch. So, on 64bit ones, the result is 64bits, on 32 is 32. In the given answer,
hash |= 0;
doesn't aways round it down to 32 bit int, in the majority of the cases (as of now, you get 64 bit int). If you need 32 bit int only, please change it tohash &= 0xFFFF;