c++ - unordered_map complexity

c++ hashtable complexity-theory unordered-map

37,900

Solution 1

The standard more or less requires using buckets for collision resolution, which means that the actual look up time will probably be linear with respect to the number of elements in the bucket, regardless of whether the element is present or not. It's possible to make it O(lg N), but it's not usually done, because the number of elements in the bucket should be small, if the hash table is being used correctly.

To ensure that the number of elements in a bucket is small, you must ensure that the hashing function is effective. What effective means depends on the types and values being hashed. (The MS implementation uses FNV, which is one of the best generic hashs around, but if you have special knowledge of the actual data you'll be seeing, you might be able to do better.) Another thing which can help reduce the number of elements per bucket is to force more buckets or use a smaller load factor. For the first, you can pass the minimum initial number of buckets as an argument to the constructor. If you know the total number of elements that will be in the map, you can control the load factor this way. You can also forse a minumum number of buckets once the table has been filled, by calling rehash. Otherwise, there is a function std::unordered_map<>::max_load_factor which you can use. It is not guaranteed to do anything, but in any reasonable implementation, it will. Note that if you use it on an already filled unordered_map, you'll probably have to call unordered_map<>::rehash afterwards.

(There are several things I don't understand about the standard unordered_map: why the load factor is a float, instead of double; why it's not required to have an effect; and why it doesn't automatically call rehash for you.)

Solution 2

As with any hash table, worst case is always linear complexity (Edit: if you built the map without any collisions like you stated in your original post, then you'll never see this case):

http://www.cplusplus.com/reference/unordered_map/unordered_map/find/

Complexity Average case: constant. Worst case: linear in container size.

Return Value An iterator to the element, if the specified key value is found, or unordered_map::end if the specified key is not found in the container.

However, because an unordered_map can only contain unique keys, you will see average complexity of constant time (container first checks hash index, and then iterates over values at that index).

I think the documentation for unordered_map::count function is more informative:

Searches the container for elements whose key is k and returns the number of elements found. Because unordered_map containers do not allow for duplicate keys, this means that the function actually returns 1 if an element with that key exists in the container, and zero otherwise.

Solution 3

To have no collisions in a hashed data structure is incredibly difficult (if not impossible for a given hash function and any kind of data). It would also require a table size exactly equal to the number of keys. No, it does not need to be that strict. As long as the hash function distributes the values in a relatively uniform way, you will have O(1) lookup complexity.

Hash tables are generally just arrays with linked lists taking care of the collisions (this is the chaining method - there are other methods, but this is likely the most utilized way of dealing with collisions). Thus, to find if a value is contained within a bucket, it will have to (potentially) iterate over all the values in that bucket. So if the hash function gives you a uniform distribution, and there are N buckets, and a total of M values, there should be (on average) M/N values per bucket. As long as this value is not too large, this allows O(1) lookup.

So, as a bit of a long winded answer to your question, as long as the hashing function is reasonable, you will get O(1) lookup, with it having to iterate over (on average) O(M/N) keys to give you a "negative" result.

37,900

user1764386

Updated on July 09, 2022

Comments

user1764386 almost 2 years

I need to create a lookup function where a (X,Y) pair corresponds to a specific Z value. One major requirement for this is that I need to do it in as close to O(1) complexity as I can. My plan is to use an unordered_map.

I generally do not use a hash table for lookup, as the lookup time has never been important to me. Am I correct in thinking that as long as I built the unordered_map with no collisions, my lookup time will be O(1)?

My concern then is what the complexity becomes if there the key is not present in the unordered map. If I use unordered_map::find():, for example, to determine whether a key is present in my hash table, how will it go about giving me an answer? Does it actually iterate over all the keys?

I greatly appreciate the help.
user1764386 over 11 years

I am now confused by jakar's answer here: stackoverflow.com/questions/4395050/… I would interpret this comment to mean it can be accomplished. Is that not the case then?
AndyG over 11 years

@user1764386: Well, find has to return something if it can't return you an iterator to your value, so unordered_map::end was the best choice.
user1764386 over 11 years

thank you for the help. I meant that I am slightly confused by his answer because I interpreted it to mean that the complexity will be better than O(N) if the key is not in the unordered_map.
AndyG over 11 years

@user1764386 on average it will be. If you see the unlikely worst case of all your inputs hashing to the same value, then the data structure must iterate over the entire list.
user1764386 over 11 years

Would you mind explaining in more detail? Can I avoid having any two keys mapping to the same value? I am building the unordered_map at one time based on input data. I am never adding to it later.
AndyG over 11 years

@user1764386:In the worst case, no. This all comes down to the hashing function that you use. If you'd like, you can provide unordered_map with your own hashing function (cplusplus.com/reference/unordered_map/unordered_map). Google hashing functions for more information on them.