Best HashMap initial capacity while indexing a List
Solution 1
If you wish to avoid rehashing the HashMap
, and you know that no other elements will be placed into the HashMap
, then you must take into account the load factor as well as the initial capacity. The load factor for a HashMap
defaults to 0.75.
The calculation to determine whether rehashing is necessary occurs whenever an new entry is added, e.g. put
places a new key/value. So if you specify an initial capacity of list.size()
, and a load factor of 1, then it will rehash after the last put
. So to prevent rehashing, use a load factor of 1 and a capacity of list.size() + 1
.
EDIT
Looking at the HashMap
source code, it will rehash if the old size meets or exceeds the threshold, so it won't rehash on the last put
. So it looks like a capacity of list.size()
should be fine.
HashMap<Integer, T> map = new HashMap<Integer, T>(list.size(), 1.0);
Here's the relevant piece of HashMap
source code:
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
Solution 2
The 'capacity' keyword is incorrect by definition and isn't used in the way typically expected.
By default the 'load factor' of a HashMap is 0.75, this means that when the number of entries in a HashMap reaches 75% of the capacity supplied, it will resize the array and rehash.
For example if I do:
Map<Integer, Integer> map = new HashMap<>(100);
When I am adding the 75th entry, the map will resize the Entry table to 2 * map.size() (or 2 * table.length). So we can do a few things:
- Change the load factor - this could impact the performance of the map
- Set the initial capacity to list.size() / 0.75 + 1
The best option is the latter of the two, let me explain what's going on here:
list.size() / 0.75
This will return list.size() + 25% of list.size(), for example if my list had a size of 100 it would return 133. We then add 1 to it as the map is resized if the size of it is equal to 75% of the initial capacity, so if we had a list with a size of 100 we would be setting the initial capacity to 134, this would mean that adding all 100 entries from the list would not incur any resize of the map.
End result:
Map<Integer, Integer> map = new HashMap<>(list.size() / 0.75 + 1);
Solution 3
Guava's Maps.newHashMapWithExpectedSize
uses this helper method to calculate initial capacity for the default load factor of 0.75
, based on some expected number of values:
/**
* Returns a capacity that is sufficient to keep the map from being resized as
* long as it grows no larger than expectedSize and the load factor is >= its
* default (0.75).
*/
static int capacity(int expectedSize) {
if (expectedSize < 3) {
checkArgument(expectedSize >= 0);
return expectedSize + 1;
}
if (expectedSize < Ints.MAX_POWER_OF_TWO) {
return expectedSize + expectedSize / 3;
}
return Integer.MAX_VALUE; // any large value
}
reference: source
From the newHashMapWithExpectedSize
documentation:
Creates a
HashMap
instance, with a high enough "initial capacity" that it should holdexpectedSize
elements without growth. This behavior cannot be broadly guaranteed, but it is observed to be true for OpenJDK 1.6. It also can't be guaranteed that the method isn't inadvertently oversizing the returned map.
Solution 4
What you're doing is fine. In this way you're sure that the hash map has at least enough capacity for the initial values. If you have more information regarding the usage patterns of the hash map (example: is it updated frequently? are many new elements added frequently?), you might want to set a bigger initial capacity (for instance, list.size() * 2
), but never lower. Use a profiler to determine if the initial capacity is falling short too soon.
UPDATE
Thanks to @PaulBellora for suggesting that the initial capacity should be set to (int)Math.ceil(list.size() / loadFactor)
(typically, the default load factor is 0.75) in order to avoid an initial resize.
Solution 5
According to the reference documentation of java.util.HashMap:
The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
This means, if you know in advance, how many entries the HashMap should store, you can prevent rehashing by choosing an appropriate initial capacity and load factor. However:
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put).
Italo Borssatto
Co-founder and tech lead at Serto, incubated at Consensys Mesh. Founder at Velis, a salesforce and CRM app. Former CTO at Mais Fácil, a credit card company in Brazil. Former partner at MobileBroker, the first stock exchange app in Brazil. Technical Director at Synos, a software house with more than 120 developers. Co-founder at Digital Media, back 2002. Co-founder at Bound Webmarketing, back 1999.
Updated on April 20, 2020Comments
-
Italo Borssatto over 3 years
I have a list (
List<T> list
) and I want to index its objects by their ids using a map (HashMap<Integer, T> map
). I always uselist.size()
as the initial capacity in theHashMap
constructor,like in the code below. Is this the best initial capacity to be used in this case?Note: I'll never add more items to the map.
List<T> list = myList; Map<Integer, T> map = new HashMap<Integer, T>(list.size()); for(T item : list) { map.put(item.getId(), item); }
-
Paul Bellora over 10 years"the hash map has at least enough capacity for the initial values" - I don't think this is true with the default load factor of 0.75.
-
Óscar López over 10 years@PaulBellora the initial capacity is the same size as the one specified in the
initialCapacity
parameter. The load factor is a measure of how full the hash table is allowed to get before its capacity (initial or not) is automatically increased -
Paul Bellora over 10 yearsRight, so with a load factor of
0.75
and an initial capacity ofn
, puttingn
values would cause it to resize. -
Óscar López over 10 years@PaulBellora so you're suggesting that the initial capacity should be size()/.75 to avoid an initial resize? makes sense, I'll update my answer
-
Paul Bellora over 10 yearsExactly +1 - you can see Guava is doing the same thing in the snippet my answer references.
-
Italo Borssatto over 10 yearsSo, if I change my load factor to 1.0, like @rgettman suggested. Then
new HashMap(list.size(), 1.0)
will have the same behaviour ofnew HashMap(Math.ceiling(list.size() / 0.75)
in my case, where I'll never add more itens to the hash map? -
Óscar López over 10 yearsIt'd be the same, but it's a bad practice to wait until the last moment to increase the hash map's size, that's why the load factor is usually less than 1
-
Italo Borssatto over 10 yearsOk. But I'm asking only in my specific case, where I'll never add another item to the hash map. In this specific case i would be a bad practice?
-
Óscar López over 10 years@italo in that case both rgettman's answer and my own would be equivalent. Also, if you want to enforce the invariant that the hashmap never changes, maybe you should make it immutable using
Collections.unmodifiableMap()
-
Klitos Kyriacou about 8 yearsLooking at the JDK source code, the actual table size is rounded up to the nearest power of 2. Also, re. your statement "By default the 'load factor' of a HashMap is 0.75, this means that when the number of entries in a HashMap reaches 75% of the capacity supplied, it will resize the array and rehash." - to be a bit pedantic, the resize only happens when the entries exceed (not reach) 75% of the capacity. So, for example, with a specified initial capacity of 64 and load factor of 0.5, you can put 32 entries in without resizing.
-
1mike12 almost 8 years@Eric just look at the source and search for "resize(" . grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/… It's probably done the same way. Line 676
-
dagnelies almost 8 yearsis it just me or doesn't anybody know that a load of 1.0 is a really bad idea?!
-
rgettman almost 8 years@arnaud Normally, yes, that would be a bad idea. But the OP did state "Note: I'll never add more items to the map."
-
dagnelies almost 8 years@rgettman: if you know how hash map work internally, you would notice that you not only screw up your inserts but also your reads. All operation would become O(N) instead of O(1) because you'll have to jump from bucket to bucket because of collisions everywhere
-
Ced almost 8 yearsAlso 100 / 0.75 = 133, not that it changes anything
-
Zardoz89 over 7 yearsYou know that you set as the correct ansower the WRONG ANSWER ?
-
gregory boero.teyssier almost 7 years+1. This is the simplest and the easiest solution for those who don't want to understand the internals of the map, and just want something that works as expected.
-
Michael Geier almost 5 yearsAnd note that the initial capacity will internally be rounded up to the next power of two. So a capacity of 200 will be rounded up to 256. If HashMap wouldn't round up to a power-of-two value for
capacity
, some buckets would be never used. The bucket index for where to put the map data is determined bybucketIndex = hashCode(key) & (capacity-1)
. This is true for both Java 7 and 8, afaik. -
Shailesh Pratapwar over 4 yearsUsing load factor of 1 is a bad idea considering the fact that hash collisions will increase if there are less free buckets. Its a tradeoff between space and time that java designers use load factor of 0.75 as default. If you are not sure about load factor internals, do not touch this default. Now, If you go with load factor of 0.75 instead of 1, the capacity of map that shouldn't cause it to rehash can be calculated with initialCapacity = (Expected no. of elements/0.75)+1. Period.
-
AgentM almost 3 years@ShaileshPratapwar I assume you take the ceil function of that formula too. e.g.
initCapacity = ceil(nExpectedElements / loadFactor) + 1
-
Aleksander Melnichnikov almost 3 yearsCould you explain why do we need +1?