Best implementation for hashCode method for a collection

2,216

Solution 1

The best implementation? That is a hard question because it depends on the usage pattern.

A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.

A short version

  1. Create a int result and assign a non-zero value.

  2. For every field f tested in the equals() method, calculate a hash code c by:

    • If the field f is a boolean: calculate (f ? 0 : 1);
    • If the field f is a byte, char, short or int: calculate (int)f;
    • If the field f is a long: calculate (int)(f ^ (f >>> 32));
    • If the field f is a float: calculate Float.floatToIntBits(f);
    • If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
    • If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
    • If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
  3. Combine the hash value c with result:

    result = 37 * result + c
    
  4. Return result

This should result in a proper distribution of hash values for most use situations.

Solution 2

If you're happy with the Effective Java implementation recommended by dmeister, you can use a library call instead of rolling your own:

@Override
public int hashCode() {
    return Objects.hashCode(this.firstName, this.lastName);
}

This requires either Guava (com.google.common.base.Objects.hashCode) or the standard library in Java 7 (java.util.Objects.hash) but works the same way.

Solution 3

Although this is linked to Android documentation (Wayback Machine) and My own code on Github, it will work for Java in general. My answer is an extension of dmeister's Answer with just code that is much easier to read and understand.

@Override 
public int hashCode() {

    // Start with a non-zero constant. Prime is preferred
    int result = 17;

    // Include a hash for each field.

    // Primatives

    result = 31 * result + (booleanField ? 1 : 0);                   // 1 bit   » 32-bit

    result = 31 * result + byteField;                                // 8 bits  » 32-bit 
    result = 31 * result + charField;                                // 16 bits » 32-bit
    result = 31 * result + shortField;                               // 16 bits » 32-bit
    result = 31 * result + intField;                                 // 32 bits » 32-bit

    result = 31 * result + (int)(longField ^ (longField >>> 32));    // 64 bits » 32-bit

    result = 31 * result + Float.floatToIntBits(floatField);         // 32 bits » 32-bit

    long doubleFieldBits = Double.doubleToLongBits(doubleField);     // 64 bits (double) » 64-bit (long) » 32-bit (int)
    result = 31 * result + (int)(doubleFieldBits ^ (doubleFieldBits >>> 32));

    // Objects

    result = 31 * result + Arrays.hashCode(arrayField);              // var bits » 32-bit

    result = 31 * result + referenceField.hashCode();                // var bits » 32-bit (non-nullable)   
    result = 31 * result +                                           // var bits » 32-bit (nullable)   
        (nullableReferenceField == null
            ? 0
            : nullableReferenceField.hashCode());

    return result;

}

EDIT

Typically, when you override hashcode(...), you also want to override equals(...). So for those that will or has already implemented equals, here is a good reference from my Github...

@Override
public boolean equals(Object o) {

    // Optimization (not required).
    if (this == o) {
        return true;
    }

    // Return false if the other object has the wrong type, interface, or is null.
    if (!(o instanceof MyType)) {
        return false;
    }

    MyType lhs = (MyType) o; // lhs means "left hand side"

            // Primitive fields
    return     booleanField == lhs.booleanField
            && byteField    == lhs.byteField
            && charField    == lhs.charField
            && shortField   == lhs.shortField
            && intField     == lhs.intField
            && longField    == lhs.longField
            && floatField   == lhs.floatField
            && doubleField  == lhs.doubleField

            // Arrays

            && Arrays.equals(arrayField, lhs.arrayField)

            // Objects

            && referenceField.equals(lhs.referenceField)
            && (nullableReferenceField == null
                        ? lhs.nullableReferenceField == null
                        : nullableReferenceField.equals(lhs.nullableReferenceField));
}

Solution 4

It is better to use the functionality provided by Eclipse which does a pretty good job and you can put your efforts and energy in developing the business logic.

Solution 5

First make sure that equals is implemented correctly. From an IBM DeveloperWorks article:

  • Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
  • Reflexivity: For all non-null references, a.equals(a)
  • Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)

Then make sure that their relation with hashCode respects the contact (from the same article):

  • Consistency with hashCode(): Two equal objects must have the same hashCode() value

Finally a good hash function should strive to approach the ideal hash function.

Share:
2,216
Nate Pet
Author by

Nate Pet

Updated on August 01, 2022

Comments

  • Nate Pet
    Nate Pet almost 2 years

    I am using MVC3 with Jquery's .load() with a PartialView().

    From my jquery, I am doing the following:

          $("#stlist").load('@Url.Action("KitEdit","Kit")' + '?id=' + id+ '&pick=' + pick)
    

    The partial view is called from the KitEdit action.

    I was wondering if there is another way of loading a partial view besides the .load()?

    I am getting some wierd behaviors where once the .load is done, some of the buttons don't work the second time.

    • John Hartsock
      John Hartsock almost 12 years
      "Some of the buttons don't work the second time" I think we need more information to assist you. Are you loading javascript with the partial view? can you provide a simple example of the issues you are having?
  • Josh
    Josh almost 16 years
    Interesting answer. Are there some link to a mathematical proof for why this formula works?
  • SquareCog
    SquareCog almost 16 years
    the bug is in the long answer by about8.blogspot.com -- getting the hashcode from a concatenation of strings leaves you with a hash function that is the same for any combination of strings that add up to the same string.
  • Kip
    Kip almost 16 years
    Yeah I'm particularly curious about where the number 37 comes from.
  • Huppie
    Huppie almost 16 years
    So this is meta-discussion and not related to the question at all? ;-)
  • SquareCog
    SquareCog almost 16 years
    It's a correction to a proposed answer that has a fairly significant flaw.
  • dmeister
    dmeister almost 16 years
    I'm not aware of any proof. The number of 37 is arbitrary, but it should be prime. Why? I'm not really sure but it has to do with modulo arthritics and properties of prime numbers which lead to go distributions.
  • dma_k
    dma_k over 13 years
    Thank you for detailed answer. Could you please tell me, which reference/guidelines you used to create this set of advices? Also I wonder, why multiply by 37? What is the magic in 37? Why multiply is better then right shift (result = result << 16 + c)?
  • dmeister
    dmeister over 13 years
    I used item 8 of Josh Bloch's "Effective Java" book.
  • Quantum7
    Quantum7 about 13 years
    +1 A good practical solution. dmeister's solution is more comprehensive, but I tend to forget to handle nulls when I try to write hashcodes myself.
  • James McMahon
    James McMahon over 12 years
    If you are going to use this be aware that reflection is expensive. I honestly wouldn't use this for anything besides throw away code.
  • James McMahon
    James McMahon over 12 years
    The downside of this API is you pay the cost of object construction every time you call equals and hashcode (unless your object is immutable and you precompute the hash), which can be a lot in certain cases.
  • Nate Pet
    Nate Pet almost 12 years
    Hi nikeaa,Complete doesn't seem to work. Meaning, it doesn't work at all if I put in Complete. Am I doing something wrong?
  • nikeaa
    nikeaa almost 12 years
    Did you define the Complete() function in your JS on the main view page?
  • Nate Pet
    Nate Pet almost 12 years
    I could not find an example of the Complete() function in the link you mentioned. Might you have one that I can look at?
  • Tassadaque
    Tassadaque almost 12 years
    live method is deprecated on method should be used instead
  • nikeaa
    nikeaa almost 12 years
    Actually, a little better would be to do the following: $("#stlist").load('@Url.Action("KitEdit","Kit")' + '?id=' + id+ '&pick=' + pick, function() { //put your code here }); This will define the function right in your .load call. Within the function $(this) will refer to the element found by the "#stlist" selector.
  • Simon Forsberg
    Simon Forsberg over 11 years
    @dma_k The reason for using prime numbers and the method described in this answer is to ensure that the computed hashcode will be unique. When using non-prime numbers, you cannot guarantee this. It does not matter which prime nummer you choose, there is nothing magical about the number 37 (too bad 42 isn't a prime number, eh?)
  • dma_k
    dma_k over 11 years
    @SimonAndréForsberg Well, computed hash code cannot be always unique :) Is a hashcode. However I got the idea: the prime number has only one multiplier, while non-prime has at least two. That creates an extra combination for multiplication operator to result the same hash, i.e. cause collision.
  • Simon Forsberg
    Simon Forsberg over 11 years
    @dma_k Yes, that's what I meant :) Of course it cannot be unique at all times (otherwise comparing hashCode would be enough to determine .equals), using prime numbers just reduces the collisions.
  • afsantos
    afsantos over 11 years
    Any particular reason for true mapping to 0 and false to 1, and not the other way around?
  • dmeister
    dmeister over 11 years
    @afsantos: I can't think of any. The code from the book is more a general guideline as a law and by no means the only correct way to do it.
  • Krzysztof Jabłoński
    Krzysztof Jabłoński about 11 years
    Like SquareCog have already noticed. If hashcode is generated once from concatenation of two strings it is extremely easy to generate masses of collisions: ("abc"+""=="ab"+"c"=="a"+"bc"==""+"abc"). It is severe flaw. It would be better to evaluate hashcode for both fields and then calculate linear combination of them (preferably using primes as coefficients).
  • Timo
    Timo over 10 years
    I edited the post but this minimum number of characters is really annoying when doing minor mark-up adjustments :( If accepted, could somebody please remove EDITED near bullet 1? Thanks.
  • jwir3
    jwir3 over 10 years
    +1 Agree with Quantum7, but I would say it's also really good to understand what the Eclipse-generated implementation is doing, and where it gets its implementation details from.
  • jwir3
    jwir3 over 10 years
    +1 Excellent explanation, and I really like that you gave the source (i.e. Bloch's book, not the source code) in the event someone wants more information.
  • Kissaki
    Kissaki over 10 years
    Unless one has a good reason not to use these, one should definitely use these in any case. (Formulating it stronger, as it IMHO should be formulated.) The typical arguments for using standard implementations/libraries apply (best practices, well tested, less error prone, etc).
  • justin.hughey
    justin.hughey over 10 years
    The case to not use a library implementation is when you have implemented a custom Object#equals(Object obj) method. See the Object#hashcode() JavaDoc, "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result." Using either of these library implementations will generate a"pseudo" memory address hash code which may violate the described requirement.
  • bacar
    bacar over 10 years
    @justin.hughey you seem to be confused. The only case you should override hashCode is if you have a custom equals, and that is precisely what these library methods are designed for. The documentation is quite clear on their behaviour in relation to equals. A library implementation does not claim to absolve you from knowing what the characteristics of a correct hashCode implementation are - these libraries make it easier for you to implement such a conforming implementation for the majority of cases where equals is overriden.
  • justin.hughey
    justin.hughey over 10 years
    You're absolutely right. Thank you for pointing that out, I don't know where I was coming from on that one. :)
  • Brigham
    Brigham almost 10 years
    If the field f is an array, you also have the option of using the java.util.Arrays.hashCode methods to compute that field's hash value.
  • Andrew Kelly
    Andrew Kelly over 9 years
    For any Android developers looking at the java.util.Objects class, it was only introduced in API 19, so make sure you're running on KitKat or above otherwise you'll get NoClassDefFoundError.
  • ruffin
    ruffin almost 9 years
  • Christopher Rucinski
    Christopher Rucinski almost 9 years
    This is a very limited implementation
  • inquisitive
    inquisitive almost 9 years
    What if the field is a string? It definitely is an object but what should be the implementation of hte hashcode
  • Malte Skoruppa
    Malte Skoruppa over 8 years
    Best answer IMO, although by way of example I would rather have chosen the JDK7 java.util.Objects.hash(...) method than the guava com.google.common.base.Objects.hashCode(...) method. I think most people would choose the standard library over an extra dependency.
  • starikoff
    starikoff over 8 years
    If there are two arguments or more and if any of them is an array, the result might be not what you expect because hashCode() for an array is just its java.lang.System.identityHashCode(...).
  • bacar
    bacar over 8 years
    @starikoff Yes - the helper methods do not absolve you from having to understand what you're doing. They just help you to avoid certain mistakes and duplicate code.
  • starikoff
    starikoff over 8 years
    @bacar Of course. I just left my comment so that whoever learns about these methods from this post learns about their limitations as well.
  • Pankaj
    Pankaj over 8 years
    How to calculate in the case of String Member type of object
  • Cope
    Cope over 8 years
    One of the main goals of a hash function is that it should be inexpensive to evaluate. I just wrote a hash function for a formal parameter data structure in a compiler. The equality test of course must check every parameter name and type, after ensuring structural isometry. But as for "For every field f tested in the equals() method", one could well find the hash lookup nearly as expensive as a linear search. You need only to create a trivial hash function that yields a broad distribution of results across usage patterns. Empirical measurement and experimentation, rather than guessing, helps.
  • dmeister
    dmeister over 8 years
    @Cope: I disagree that the main goal of a hash function is speed. While there are certain special cases for other hashCode implementations, this is a very good start. Certainly it is a correct implementation. Most "smart" ones are just plain wrong.
  • Cope
    Cope over 8 years
    Hej @dmeister — I stand my ground. It's very simple. 1. The primary goal of a hash function (in data structures, rather than, e.g., in encryption) is speed — to avoid the cost of a linear search. 2. There is a tradeoff between the cost of computing the hash and the cost of a linear search, and on multi-field structures even just searching on the right key can be enough to adequately lower the cost. 3. For large data structures (like my formal parameters) the number of computations would be in the dozens if you measure "every field." 4. There is no universal answer — so you must measure.
  • sotn
    sotn over 8 years
    I saw that 37 elsewhere for calculating string hash code algorithm and judged that there are 26 letters and 10 numbers plus space character which comes up with 37 lol
  • Diablo
    Diablo over 8 years
    this was my favorite approach, until recently. I have ran into StackOverFlowError while using a criteria for SharedKey OneToOne association. More over, Objects class provides hash(Object ..args) & equals() methods from Java7 on. These are recommended for any applications using jdk 1.7+
  • Darrell Teague
    Darrell Teague about 8 years
    Sorry but answers involving "functionality provided by [some IDE]" are not really relevant in the context of the programming language in general. There are dozens of IDEs and this does not answer the question... namely because this is more about algorithmic determination and directly associated to equals() implementation - something an IDE will know nothing about.
  • Darrell Teague
    Darrell Teague about 8 years
    The comments about association of overriding equals() are correct. hashCode() is not implemented in a vacuum and must be directly related to how equals() is implemented per the documentation (two objects can have the same hashCode but NOT be equal - yet two equal objects MUST have the same hashCode).
  • Christopher Rucinski
    Christopher Rucinski almost 7 years
    Android Documentation now does not include the above code anymore, so here is a cached version from the Wayback Machine - Android Documentation (Feb 07, 2015)
  • dano
    dano over 6 years
    This is a question about Java, not C++.
  • maaartinus
    maaartinus over 6 years
    @Kissaki On the opposite: Using the standard implementation leads to standard collisions.
  • maaartinus
    maaartinus over 6 years
    @Diablo I guess, your problem was a cycle in the object graph and then you're out of luck with most implementation as you need to ignore some reference or to break the cycle (mandating an IdentityHashMap). FWIW I use an id-based hashCode and equals for all entities.
  • maaartinus
    maaartinus over 6 years
    Your implementation avoid the problem and introduces another one; Swapping foo and bar leads to the same hashCode. Your toString AFAIK does not compile, and if it does, then it's terrible inefficient. Something like 109 * getFoo().hashCode() + 57 * getBar().hashCode() is faster, simpler and produces no needless collisions.
  • maaartinus
    maaartinus over 6 years
    @KrzysztofJabłoński Right. Moreover, swapping foo and bar produces a needless collision, too.
  • Abhishek Singh
    Abhishek Singh over 6 years
    Could someone please explain what this line is doing in case of long? f ^ (f >>> 32)
  • John R Perry
    John R Perry over 6 years
    In the third version of Effective Java, the content of Item 8 has nothing to do with hashcode so maybe remove the mention of the item.
  • dmeister
    dmeister about 6 years
    I think Item 8 was based on the second edition. The third edition has a page at the end how the different items map from edition to edition.
  • kristjan
    kristjan about 6 years
    With good hashing, the probability for hash collision should be 2^-32, but with 37, you can easily run into collisions. For example, for an object with two ints (x, y), the hashcodes for (1, -37), (0, 0), (-1, 37) would all evaluate to 0. So in real life (with small values much more likely than large), the probability is nowhere near 2^-32. Better use a prime like 912376189.
  • NetMage
    NetMage about 5 years
    This function handles double like .Net, and my recent testing shows that it is very poor when your doubles are not truly random, but really integers, or have only one or two decimal places. The re-hash function from HashMap helps, but still isn't as good as just using the HashCode for int on the value in this case.
  • mkuech
    mkuech about 4 years
    @AndrewKelly It's been a few years but for anyone else who comes across this and is still supporting < 19, now there's ObjectsCompat.hash(...) and its implementation is just Arrays.hashCode(...) for API < 19, which appears to be the exact same implementation in Objects.hash(...). Android Studio has a code generator for this that will insert Objects.hash(...) though no matter what your minSdkVersion is so devs should be aware they can just replace that with ObjectsCompat and they'll be fine.
  • Andrew Kelly
    Andrew Kelly about 4 years
    Objects.hashCode() is one of the APIs that is now desugared by D8 automatically in Android Studio 4+, so you can use the original API on targets earlier than API 19 (no need for the ObjectsCompat). For a full list of desugared APIs, check this file. jakewharton.com/static/files/d8_api_desugar_list.txt
  • jashgopani
    jashgopani over 3 years
    @demister Can you explain the logic of the long data type hash code (int)(f ^ (f >>> 32))
  • Edwin Buck
    Edwin Buck almost 3 years
    @Kip It is common practice to use powers of 2 to assign hash buckets; because it makes the modulus operation very fast, effectively turning it into a binary AND operation %4 -> AND( 0x0003). To get a decent distribution in the new space from a well distributed input, one wants to multiply the input by a coprime, of which 37 is coprime to all powers of 2. Other numbers would work, but you want one small enough to work with char, and large enough to overflow the limit quickly. 37 shifts the input by one, three, and six places, helping the input bits spread out more. 37, 31, 27 etc also work.
  • Stijn Leenknegt
    Stijn Leenknegt over 2 years
    Another tip I found out the hard way. Don't use fields in hashCode() that change during the lifetime of the object. I use a WeakHashMap and there was an entry but after a while it returned NULL because one of the field changed. So the hashCode() was also changed, and the HashMap couldn't find the object anymore.