Hashing Keys in Java

13,270

Solution 1

when I use the string hashcode as a key in the HashMap.

You mustn't use the hash code itself as the key. Hash codes aren't intended to be unique - it's entirely permitted for two non-equal values to have the same hash code. You should use the string itself as a key. The map will then compare hash codes first (to narrow down the candidate matches quickly) and then compare with equals for genuine string equality.

Of course, that's assuming your code really is as your question makes it, e.g.

HashMap<String, String> goodMap = new HashMap<String, String>();
goodMap.put("foo", "bar");

HashMap<Integer, String> badMap = new HashMap<Integer, String>();
badMap.put("foo".hashCode(), "bar");

If that's really what your code looks like, just use HashMap<String, String> instead.

From the docs for Object.hashCode() (emphasis mine):

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

Solution 2

Of course. Different Strings can have the same hashCode, so if you store two such strings as keys in a map, you'll have two entries (since the strings are different). Whareas if you use their hashCode as the key, you'll have only one entry (since their hashCode is the same).

The hashCode isn't used to tell if two keys are equal. It's only used to assign a bucket to the key. Once the bucket is found, every key contained in the bucket is compared to the new key with equals, and the key is added to the bucket if no equal key can be found.

Solution 3

The problem is that, even if two objects are different, doesn't mean that their hashcodes are also different.

Two different objects can share the same hashcode. So, you shouldn't have them as a HashMap key.

Also, because hash codes returned from Object.hashCode() method are of type int, you can only have 2^32 different values. That's why you will have "collisions" depending on the hashing algorithm, for different objects.

In short: -

!obj.equals(obj1) doesn't ensures that obj.hashCode() != obj1.hashCode().

Solution 4

HashCodes can be same or different for same String so be careful with that. May be this is why you are getting a different result.

Here's another SO question on it. See Jon Skeet's accepted answer.

Share:
13,270
user1785771
Author by

user1785771

Updated on June 04, 2022

Comments

  • user1785771
    user1785771 almost 2 years

    In java, when I use a String as a key for Hashmap I get a little different result than when I use the string hashcode as a key in the HashMap.

    Any insight?

  • Jon Skeet
    Jon Skeet over 11 years
    I'd use !obj.equals(obj1) in the last line, as that's the important part.
  • user1785771
    user1785771 over 11 years
    Thanks all for the answers. I am trying to avoid storing the key as string as it will consume more memory!
  • JB Nizet
    JB Nizet over 11 years
    Don't jump to this conclusion without measuring. Why would it use more memory? The map doesn't make a copy of the key. It just uses a reference to the key.
  • user1785771
    user1785771 over 11 years
    I know. But when I have over two million records, then storing theirs string keys is gonna make a big difference! @JB
  • Jon Skeet
    Jon Skeet over 11 years
    @user1785771: They use more memory for a good reason: there's more imjportant data than just the 32 bits for a hash code. If you need to store a lot of strings, get a lot of memory. Memory is cheap; mistakes due to incorrect use of a hash map could easily be very expensive.