HashMap initialization parameters (load / initialcapacity)

52,402

Solution 1

Regarding the load factor, I'll simply quote from the HashMap javadoc:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

Meaning, the load factor should not be changed from .75 , unless you have some specific optimization you are going to do. Initial capacity is the only thing you want to change, and set it according to your N value - meaning (N / 0.75) + 1, or something in that area. This will ensure that the table will always be large enough and no rehashing will occur.

Solution 2

I ran some unit tests to see if these answers were correct and it turned out that using:

(int) Math.ceil(requiredCapacity / loadFactor);

as the initial capacity gives what you want for either a HashMap or a Hashtable. By "what you want" I mean that adding requiredCapacity elements to the map won't cause the array which it's wrapping to resize and the array won't be larger than required. Since the default load capacity is 0.75, initializing a HashMap like so works:

... = new HashMap<KeyType, ValueType>((int) Math.ceil(requiredCapacity / 0.75));

Since a HashSet is effectively just a wrapper for a HashMap, the same logic also applies there, i.e. you can construct a HashSet efficiently like this:

.... = new HashSet<TypeToStore>((int) Math.ceil(requiredCapacity / 0.75));

@Yuval Adam's answer is correct for all cases except where (requiredCapacity / 0.75) is a power of 2, in which case it allocates too much memory.
@NotEdible's answer uses too much memory in many cases, as the HashMap's constructor itself deals with the issues that it want the maps array to have a size which is a power of 2.

Solution 3

In the guava libraries from Google there is a function that creates a HashMap optimized for a expected number of items: newHashMapWithExpectedSize

from the docs:

Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth ...

Solution 4

It's also notable that having a HashMap on the small side makes hash collisions more likely, which can slow down lookup. Hence, if you really worry about the speed of the map, and less about its size, it might be worth making it a bit too large for the data it needs to hold. Since memory is cheap, I typically initialise HashMaps for a known number of items with

HashMap<Foo> myMap = new HashMap<Foo>(numberOfElements * 2);

Feel free to disagree, in fact I'd quite like to have this idea verified or thrown out.

Solution 5

The answer Yuval gave is only correct for Hashtable. HashMap uses power-of-two buckets, so for HashMap, Zarkonnen is actually correct. You can verify this from the source code:

  // Find a power of 2 >= initialCapacity
  int capacity = 1;
  while (capacity < initialCapacity)
  capacity <<= 1;

So, although the load factor of 0.75f is still the same between Hashtable and HashMap, you should use an initial capacity n*2 where n is the number of elements you plan on storing in the HashMap. This will ensure the fastest get/put speeds.

Share:
52,402
Ran Biron
Author by

Ran Biron

Interested in framework building, thread synchronization and language features. Also in sci-fi books and audio-books, web-comics and cats.

Updated on October 03, 2020

Comments

  • Ran Biron
    Ran Biron over 3 years

    What values should I pass to create an efficient HashMap / HashMap based structures for N items?

    In an ArrayList, the efficient number is N (N already assumes future grow). What should be the parameters for a HashMap? ((int)(N * 0.75d), 0.75d)? More? Less? What is the effect of changing the load factor?

  • Ran Biron
    Ran Biron almost 15 years
    Looks like a very nice tool - pity there's no trial version
  • Peter Wippermann
    Peter Wippermann about 13 years
    I disagree. From HashMap's JavaDoc: >>Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important. <<
  • Zordid
    Zordid about 12 years
    Why should one initialize a list with a higher capacity than the maximum number of elements it will hold? That's not logical. Only for maps, as their constructor parameter does mean something completely different than for lists it is good to calculate a higher value!
  • linqu
    linqu about 11 years
    can you point out why @Yuval Adam's answer consumes too much memory in given case? thanks
  • Mark Rhodes
    Mark Rhodes about 11 years
    It's because the HashMap always works with a backing array with a length which is a power of 2. So if (requiredCapacity / 0.75) is a power of 2, then setting the initial capacity to (requiredCapacity / 0.75) + 1 will mean that it will allocate twice as much memory (it rounds up to the next power of 2). This is "too much" in the sense that adding requiredCapacity elements to a HashMap with a backing array half that size won't cause it to resize. Hope that makes sense!
  • Jim
    Jim almost 11 years
    Iteration over the whole map will be slower but lookups (get) will be faster.
  • Klitos Kyriacou
    Klitos Kyriacou over 8 years
    An equivalent of (int) Math.ceil(requiredCapacity / 0.75), avoiding a method call and conversions to and from floating-point, is (requiredCapacity*4+2)/3. This gives the same result while using purely int arithmetic.
  • Kim Ahlstrøm Meyn Mathiassen
    Kim Ahlstrøm Meyn Mathiassen almost 7 years
    You link to a HashSet not a HashMap.
  • Michael Geier
    Michael Geier over 5 years
    Regarding initial capacity, let me add that the initial capacity will internally be rounded up to the next power of two. So a capacity of 200 will be rounded up to 256. If HashMap wouldn't round up to a power-of-two value for capacity, some buckets would be never used. The bucket index for where to put the map data is determined by bucketIndex = hashCode(key) & (capacity-1).
  • lowselfesteemsucks
    lowselfesteemsucks about 5 years
    Interesting to point out that they are using the same logic as @Yuval Adam's answer: (float)expectedSize / 0.75F + 1.0F which means it for size of power of 2, memory allocation is very big. See @Mark Rhodes answer.
  • Yann TM
    Yann TM over 4 years
    After testing with both, I tend to agree with this pragmatic oversize rather than (requiredCapacity*4+2)/3 ; the problem is not to just avoid a reindex because we barely did not hit the threshold, and are thus in "heavily loaded" conditions where a single additional insert would make default heuristics pay a reindex. We want a good low collision hash, large enough to store the items no problemo with O(1) lookup.