Why ArrayList grows at a rate of 1.5, but for Hashmap it's 2?

13,959

Solution 1

The expensive part at increasing the capacity of an ArrayList is copying the content of the backing array a new (larger) one.

For the HashMap, it is creating a new backing array and putting all map entries in the new array. And, the higher the capacity, the lower the risk of collisions. This is more expensive and explains, why the expansion factor is higher. The reason for 1.5 vs. 2.0? I consider this as "best practise" or "good tradeoff".

Solution 2

for HashMap why the capacity should always be in power of two?

I can think of two reasons.

  1. You can quickly determine the bucket a hashcode goes in to. You only need a bitwise AND and no expensive modulo. int bucket = hashcode & (size-1);

  2. Let's say we have a grow factor of 1.7. If we start with a size 11, the next size would be 18, then 31. No problem. Right? But the hashcodes of Strings in Java, are calculated with a prime factor of 31. The bucket a string goes into,hashcode%31, is then determined only by the last character of the String. Bye bye O(1) if you store folders that all end in /. If you use a size of, for example, 3^n, the distribution will not get worse if you increase n. Going from size 3 to 9, every element in bucket 2, will now go to bucket 2,5 or 7, depending on the higher digit. It's like splitting each bucket in three pieces. So a size of integer growth factor would be preferred. (Off course this all depends on how you calculate hashcodes, but a arbitrary growth factor doesn't feel 'stable'.)

Solution 3

The way HashMap is designed/implemented its underlying number of buckets must be a power of 2 (even if you give it a different size, it makes it a power of 2), thus it grows by a factor of two each time. An ArrayList can be any size and it can be more conservative in how it grows.

Solution 4

The accepted answer is not actually giving exact response to the question, but comment from @user837703 to that answer is clearly explaining why HashMap grows by power of two.

I found this article, which explains it in detail http://coding-geek.com/how-does-a-hashmap-work-in-java/

Let me post fragment of it, which gives detailed answer to the question:

// the function that returns the index of the bucket from the rehashed hash
static int indexFor(int h, int length) {
    return h & (length-1);
}

In order to work efficiently, the size of the inner array needs to be a power of 2, let’s see why.

Imagine the array size is 17, the mask value is going to be 16 (size -1). The binary representation of 16 is 0…010000, so for any hash value H the index generated with the bitwise formula “H AND 16” is going to be either 16 or 0. This means that the array of size 17 will only be used for 2 buckets: the one at index 0 and the one at index 16, not very efficient…

But, if you now take a size that is a power of 2 like 16, the bitwise index formula is “H AND 15”. The binary representation of 15 is 0…001111 so the index formula can output values from 0 to 15 and the array of size 16 is fully used. For example:

  • if H = 952 , its binary representation is 0..01110111000, the associated index is 0…01000 = 8
  • if H = 1576 its binary representation is 0..011000101000, the associated index is 0…01000 = 8
  • if H = 12356146, its binary representation is 0..0101111001000101000110010, the associated index is 0…00010 = 2
  • if H = 59843, its binary representation is 0..01110100111000011, the associated index is 0…00011 = 3

This is why the array size is a power of two. This mechanism is transparent for the developer: if he chooses a HashMap with a size of 37, the Map will automatically choose the next power of 2 after 37 (64) for the size of its inner array.

Solution 5

Hashing takes advantage of distributing data evenly into buckets. The algorithm tries to prevent multiple entries in the buckets ("hash collisions"), as they will decrease performance.

Now when the capacity of a HashMap is reached, size is extended and existing data is re-distributed with the new buckets. If the size-increas would be too small, this re-allocation of space and re-dsitribution would happen too often.

Share:
13,959
Arnab Biswas
Author by

Arnab Biswas

Father of two daughters. ML Architect. Developer

Updated on August 16, 2022

Comments

  • Arnab Biswas
    Arnab Biswas almost 2 years

    As per Sun Java Implementation, during expansion, ArrayList grows to 3/2 it's initial capacity whereas for HashMap the expansion rate is double. What is reason behind this?

    As per the implementation, for HashMap, the capacity should always be in the power of two. That may be a reason for HashMap's behavior. But in that case the question is, for HashMap why the capacity should always be in power of two?

  • Joachim Sauer
    Joachim Sauer over 13 years
    While this explains the basic principle, it doesn't really explain why HashMap multiplies the size by 2 instead of 1.5 (for example) as ArrayList does.
  • Arnab Biswas
    Arnab Biswas over 13 years
    Even the ArrayList may multiply the capacity by 2. Is there any harm in it?
  • Arnab Biswas
    Arnab Biswas over 13 years
    Thanks Peter, I have checked the source code earlier. But that didn't help me to understand the intention of the API developer.
  • Joachim Sauer
    Joachim Sauer over 13 years
    By the way: the OpenJDK (and thus the Oracle JDK) use some quite different code, but effectively increments by half its size as well.
  • Lucas Zamboulis
    Lucas Zamboulis over 13 years
    The harm is that the bigger the size of the ArrayList, the more memory allocated to it (which could go to waste if the space is not used). Since increasing the capacity of the ArrayList is much less expensive than increasing the capacity of a HashMap, it makes sense to be more conservative with the increase of capacity of an ArrayList. Essentially, @Andreas_D explained why the factor for a HashMap should be larger than that of an ArrayList. Why 2.0 and 1.5 specifically? This is probably based on usage tests, but you'd have to ask the Java developers themselves I guess.
  • maaartinus
    maaartinus over 11 years
    Regarding your second argument: 1. Avoiding 31 is easy. 2. The expression hashcode%31 can't work because of negative values. 3. Some "hash strengthening" like in HashMap.hash could help. 4. Modulus can be replaced by something like (int) ((size * (h & 0xFFFFFFFFL)) >> 32) which is more than twice as fast on my computer. 5. That all said, +1.
  • maaartinus
    maaartinus over 11 years
    @Arnab Biswas: One more reason: The unused memory in ArrayList is wasted, unlike in HashMap where it makes the collisions rate lower and thus speeds up the access.
  • Admin
    Admin about 9 years
    Also, hash tables generally increase by a factor of 2 (and the list of buckets backing it has a power-of-2 size) because of an optimization that's used in nearly every implementation: when a hash is calculated, a modulo operation (%) is done to find to bucket to put values in: bucketIndex = hash % numBuckets. The expensive x % n operation can be reduced to bitwise x & (n - 1), but only when n is a power of two. The hashmap/table must grow by a factor of 2 each time to maintain the power-of-two size for the buckets backing it. See the other answers below.
  • Saint
    Saint almost 8 years
    x & (n - 1) stil can produce the correct bucetIndex even through n is not power of two, doesn't it? @Arka Majumdar
  • friederbluemle
    friederbluemle over 7 years
    For ArrayList, the new capacity is now calculated using the slightly more efficient int newCapacity = oldCapacity + (oldCapacity >> 1);
  • harold
    harold over 6 years
    @Saint it will produce indexes strictly less than n (as long as n is positive), but it can skip buckets, potentially most of them (eg if n is a power of two plus 1)
  • Saint
    Saint over 6 years
    OK, I understand it, you are totally correct. @harold
  • Bubesh p
    Bubesh p over 2 years
    And the reason for arraylist's growth rate might be, that if we double the capacity of arraylist, we will not be able to use the previously used memory. Example: Initial capacity of arraylist = 10 (memory location = 1 to 10). After doubling, Capacity of arraylist = 20 (memory location = 11 to 30). After doubling, Capacity of arraylist = 40 (memory location = 31 to 70). After doubling, Capacity of arraylist = 80 (memory location = 71 to 150)
  • Bubesh p
    Bubesh p over 2 years
    But if we use 1.5 as growth rate. Example: Initial Capacity = 10 (memory location = 1 to 10). After increasing the capacity by 1.5, Capacity of arraylist = 15**(memory location = 11 to 25). On increasing the capacity by 1.5 further, **Capacity of arraylist = 22 (memory location = 26 to 47). On increasing the capacity by 1.5 further, Capacity of arraylist = 33 (memory location = 48 to 80). On increasing the capacity by 1.5 further, Capacity of arraylist = 49 (memory location = 81 to 129). Continued in next comment
  • Bubesh p
    Bubesh p over 2 years
    On increasing the capacity by 1.5 further, Capacity of arraylist = 73 (memory location = 1 to 73). Here we can make use of previously used memory locations from 1 to 73.