How does Java 8's HashMap degenerate to balanced trees when many keys have the same hash code?

14,355

Solution 1

The implementation notes comment in HashMap is a better description of HashMap's operation than I could write myself. The relevant parts for understanding the tree nodes and their ordering are:

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. [...] Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. [...]

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable" type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)

When two objects have equal hash codes but are not mutually comparable, method tieBreakOrder is invoked to break the tie, first by string comparison on getClass().getName() (!), then by comparing System.identityHashCode.

The actual tree building starts in treeifyBin, beginning when a bin reaches TREEIFY_THRESHOLD (currently 8), assuming the hash table has at least MIN_TREEIFY_CAPACITY capacity (currently 64). It's a mostly-normal red-black tree implementation (crediting CLR), with some complications to support traversal in the same way as hash bins (e.g., removeTreeNode).

Solution 2

Read the code. It is mostly a red-black tree.

It does not actually require the implementation of Comparable, but can use it if available (see for instance the find method)

Share:
14,355
Admin
Author by

Admin

Updated on June 26, 2022

Comments

  • Admin
    Admin about 2 years

    How does Java 8's HashMap degenerate to balanced trees when many keys have the same hash code? I read that keys should implement Comparable to define an ordering. How does HashMap combine hashing and natural ordering to implement the trees? What about classes that do not implement Comparable, or when multiple, non-mutually-comparable Comparable implementations are keys in the same map?