Unordered map vs vector

17,144

Solution 1

As an alternative, you could consider using an ordered vector because the vector itself will not be modified. You can easily write an implementation yourself with STL lower_bound etc, or use an implementation from libraries ( boost::flat_map).

There is a blog post from Scott Meyers about container performance in this case. He did some benchmarks and the conclusion would be that an unordered_mapis probably a very good choice with high chances that it will be the fastest option. If you have a restricted set of keys, you can also compute a minimal optimal hash function, e.g. with gperf

However, for these kind of problems the first rule is to measure yourself.

Solution 2

My problem was to find a record on a container by a given std::string type as Key access. Considering Keys that only EXISTS(not finding them was not a option) and the elements of this container are generated only at the beginning of the program and never touched thereafter.

I had huge fears unordered map was not fast enough. So I tested it, and I want to share the results hoping I've not mistaken everything. I just hope that can help others like me and to get some feedback because in the end I'm beginner. So, given a struct of record filled randomly like this:

struct The_Mess 
{   
    std::string A_string;
    long double A_ldouble;
    char C[10]; 
    int* intPointer;
    std::vector<unsigned int> A_vector;
    std::string Another_String;
}        

I made a undordered map, give that A_string contain the key of the record:

std::unordered_map<std::string, The_Mess> The_UnOrdMap;

and a vector I sort by the A_string value(which contain the key):

std::vector<The_Mess> The_Vector;

with also a index vector sorted, and used to access as 3thrd way:

std::vector<std::string> index;

The key will be a random string of 0-20 characters in lenght(I wanted the worst possible scenario) containing letter both capital and normal and numbers or spaces.

So, in short our contendents are:

  1. Unordered map I measure the time the program get to execute:

    record = The_UnOrdMap.at( key ); record is just a The_Mess struct.

  2. Sorted Vector measured statements:

    low = std::lower_bound (The_Vector.begin(), The_Vector.end(), key, compare); record = *low;

  3. Sorted Index vector:

    low2 = std::lower_bound( index.begin(), index.end(), key); indice = low2 - index.begin(); record = The_Vector[indice];

The time is in nanoseconds and is a arithmetic average of 200 iterations. I have a vector that I shuffle at every iteration containing all the keys, and at every iteration I cycle through it and look for the key I have here in the three ways. So this are my results: Results

I think the initials spikes are a fault of my testing logic(the table I iterate contains only the keys generated so far, so it only has 1-n elements). So 200 iterations of 1 key search for the first time. 200 iterations of 2 keys search the second time etc...

Anyway, it seem that in the end the best option is the unordered map, considering that is a lot less code, it's easier to implement and will make the whole program way easier to read and probably maintain/modify.

Solution 3

You have to think about caching as well. In case of std::vector you'll have very good cache performance when accessing the elements - when accessing one element in RAM, CPU will cache nearby memory values and this will include nearby portions of your std::vector.

When you use std::map (or std::unordered_map) this is no longer true. Maps are usually implemented as self balancing binary-search trees, and in this case values can be scattered around the RAM. This imposes great hit on cache performance, especially as maps get bigger and bigger as CPU just cannot cache the memory that you're about to access.

You'll have to run some tests and measure performance, but cache misses can greatly hurt the performance of your program.

Solution 4

You are most likely to get the same performance (the difference will not be measurable).

Contrary to what some people seem to believe, unordered_map is not a binary tree. The underlying data structure is a vector. As a result, cache locality does not matter here - it is the same as for vector. Granted, you are going to suffer if you have collissions due to your hashing function being bad. But if your key is a simple integer, this is not going to happen. As a result, access to to element in hash map will be exactly the same as access to the element in the vector with time spent on getting hash value for integer, which is really non-measurable.

Share:
17,144
Renato Kuurstra
Author by

Renato Kuurstra

Updated on June 04, 2022

Comments

  • Renato Kuurstra
    Renato Kuurstra almost 2 years

    I'm building a little 2d game engine. Now I need to store the prototypes of the game objects (all type of informations). A container that will have at most I guess few thousand elements all with unique key and no elements will be deleted or added after a first load. The key value is a string.

    Various threads will run, and I need to send to everyone a key(or index) and with that access other information(like a texture for the render process or sound for the mixer process) available only to those threads.

    Normally I use vectors because they are way faster to accessing a known element. But I see that unordered map also usually have a constant speed if I use the ::at element access. It would make the code much cleaner and also easier to maintain because I will deal with much more understandable man made strings.

    So the question is, the difference in speed between a access to a vector[n] compared to a unorderedmap.at("string") is negligible compared to his benefits?

    From what I understand accessing various maps in different part of the program, with different threads running just with a "name" for me is a big deal and the speed difference isn't that great. But I'm too inexperienced to be sure of this. Although I found informations about it seem I can't really understand if I'm right or wrong.

    Thank you for your time.

  • dbajgoric
    dbajgoric over 8 years
    Care explaining your self?
  • Daniel Strul
    Daniel Strul over 8 years
    With regards to the OP's question, you may want to link to hashed-map description rather than map/tree description. Still, it's probably just as bad for the cache, so your advice holds all the same.
  • dbajgoric
    dbajgoric over 8 years
    @Daniel Exactly, I didn't give a definite advice or answer. But cache performance has to be taken into consideration when choosing your data structures. That was the point of my answer :)
  • Renato Kuurstra
    Renato Kuurstra over 8 years
    I think I got my answer, considering: I can't force c++ to cache stuff and this map/vector will be used a LOT, on the order of thousand of times every 1/60th of a second. So I guess I'll stick to vectors, and maybe when I'm done I will run a few test. Thanks!
  • Daniel Strul
    Daniel Strul over 8 years
    The OP now has received 2 conflicting answers. Wouldn't it be helpful to further explain your answer? If cache locality does not matter here, I'd be very interested in learning why.
  • dbajgoric
    dbajgoric over 8 years
    Well you can impact caching performance indirectly. By choosing suitable ways to represent your data you can help to make caching more efficient.
  • Renato Kuurstra
    Renato Kuurstra over 8 years
    Now I'm not sure any more, I have read the Scott Meyers blog post from the other answer, which seem relevant to me. It's also coherent with what I read all over, vectors are better under 50 elements in size, then unordered maps start to take over. Maybe the only answer is profile and test by yourself. All of this to me is necessary because I want a easy and human. I anyway thanks for all the answers, I got very nice leads to follow, when I'm done if I have time, or the result isn't sufficient I will make some tests and post them there.
  • SergeyA
    SergeyA over 8 years
    @DanielStrul, because unordered_map is not a tree.
  • SergeyA
    SergeyA over 8 years
    @Džanan unordered_map is not a tree.
  • Steger
    Steger over 8 years
    But unordered_map does have a performance cost for recomputing the hash for each access. So performance disadvantage primarily hinges on having a good hash algorithm -- fast and minimal collisions.
  • SergeyA
    SergeyA over 8 years
    @Steger, getting hash from integer is trivial.
  • Joshua Hyatt
    Joshua Hyatt over 3 years
    std::set and std::map are usually implemented as trees, because they need to be ordered. std::unordered_map and std::unordered_set are not trees, because they are unordered. Trees are used for ordering. Unordered containers use buckets.
  • Paiusco
    Paiusco about 3 years
    +1 for Scott Meyers' blog post. That basically solved a hour-long discussion around a similiar issue. Everyone should read that up :)