C++: What is faster - lookup in hashmap or switch statement?

16,154

Solution 1

A switch construct is faster (or at least not slower).

That's mostly because a switch construct gives static data to the compiler, while a runtime structure like a hash map doesn't.

When possible compilers should compile switch constructs into array of code pointers: each item of the array (indexed by your indexes) points to the associated code. At runtime this takes O(1), while a hash map could take more: O(log n) at average case or O(n) at worst case, usually, and anyway a bigger constant number of memory accesses.

Solution 2

I will add my 5 cents:

For the number of entries at about 50 std::unordered_map (hash based, O(1)) is typically slower then std::map (tree based O(ln(N))), and both of them are slower then boost::flat_map(sorted vector O(ln(N))) which I tend to use in such cases. It is not always the case that switch can be compiled to jump table, and when it is, you can simply put your values (or functions) in vector yourself and access by index. Otherwise switch is marginally faster than boost::flat_map.

Please note the word "typically" in the beginning, if you do care about performance of this piece of code do the profiling (and share results with us :)).

Solution 3

A switch statement is going to be quicker than a look up in a hash map.

However, a map is going to result in much more readable code if you ever change the mappings. You can easily do this with a map by reading the results in from a file. In a switch statement you'd have to change the code and recompile.

Solution 4

An array will have the fastest access time, by definition.

The switch statement compares values, then uses a jump table (which is an array of function pointers).

The hashmap computes a hash value from the data, then either searches a tree in memory or uses the hash value as an index into an array. Slow because of computing the hash value.

On most modern platforms, 64k, is not a big amount of data and can be statically allocated as a constant array.

One problem with the array technique is account for keys that you have not accounted for. One example is to use a unique sentinel value. When the value is returned, you know you have an unknown key.

I suggest using a static const array of values.

Solution 5

The switch will be faster. If it's a small number of cases, as in your example, it will use an if-chain. If a large number of cases, and if they are reasonably compact, it has the option to generate a jump-table, which only takes a few instructions. (BTW you don't have to order the cases.) The hash-map is O(1), but will probably take in the range of 10-40 instructions.

Share:
16,154
Haspemulator
Author by

Haspemulator

That was easy.

Updated on June 03, 2022

Comments

  • Haspemulator
    Haspemulator about 2 years

    I have a code pattern which translates one integer to another. Just like this:

    int t(int value) {
        switch (value) {
            case 1: return const_1;
            case 3: return const_2;
            case 4: return const_3;
            case 8: return const_4;
            default: return 0;
        }
    }
    

    It has about 50 entries currently, maybe later on there will be some more, but probably no more than hundred or two. All the values are predefined, and of course I can order case labels by their values. So the question is, what will be faster - this approach or put this into hash map (I have no access to std::map, so I'm speaking about custom hash map available in my SDK) and perform lookups in that table? Maybe it's a bit of premature optimization, though... But I just need your opinions.

    Thanks in advance.

    EDIT: My case values are going to be in range from 0 to 0xffff. And regarding the point of better readability of hash map. I'm not sure it really will have better readability, because I still need to populate it with values, so that sheet of constants mapping is still needs to be somewhere in my code.

    EDIT-2: Many useful answers were already given, much thanks. I'd like to add some info here. My hash key is integer, and my hash function for integer is basically just one multiplication with integral overflow:

    EXPORT_C __NAKED__ unsigned int DefaultHash::Integer(const int& /*aInt*/)
    {
    _asm mov edx, [esp+4]
    _asm mov eax, 9E3779B9h
    _asm mul dword ptr [edx]
    _asm ret
    }
    

    So it should be quite fast.

  • unkulunkulu
    unkulunkulu almost 13 years
    hash map in O(n) is a pretty bad hash map
  • Nobody moving away from SE
    Nobody moving away from SE almost 13 years
    Did I get something wrong in my education I thought hashing should be O(1) and in worst case (many collisions) O(N)? Also I would say the speed of the switch depends on the values you want to tell apart. If they are a closed range with fixed step size switches are fast. But what about a range like [1,100000000] with random entries?
  • peoro
    peoro almost 13 years
    Yeah, sorry, fixed it. O(n) is the worst case (or it could be always O(log n) for tree maps). My point anyway was supposed to be: it could take more than O(1) in some conditions, and even when it's O(1) it will most probably requires a bigger number of constant operations.
  • peoro
    peoro almost 13 years
    @Nobody: switch are implemented as an array when possible, in your example it's not, but since that data is statically available to the compiler, it should be able to give a result non worse than whatever runtime structure.
  • Haspemulator
    Haspemulator almost 13 years
    Regarding the ordering of cases: I read somewhere that I should do it to achieve higher execution speed. So you say it's wrong? Can you give some more details on this?
  • Bo Persson
    Bo Persson almost 13 years
    @Hasp - If the code turns into an if-chain, the first case could be ever so slightly faster. If the compiler decides on a jump table, it doesn't help.
  • Mike Dunlavey
    Mike Dunlavey almost 13 years
    @Haspemulator: Bo's basically right. You could have a lot of sparse cases (like 1000000, 2000000, 3000000, etc.), causing the compiler to make a long if-chain. Then in the worst case, your program might have to always run down to the very last one.
  • Cameron
    Cameron almost 9 years
    As always, to find great answer, look at the bottom most replies.
  • Alex Huszagh
    Alex Huszagh almost 8 years
    Don't switch statements use binary searches, which would be O(log(n)) complexity, as opposed to O(1) for a hashmap? Obviously, there's the overhead of the hash function too, but for pure time complexity, this seems off.
  • peoro
    peoro over 6 years
    @AlexanderHuszagh: no binary search is needed when switch indices are contiguous. When the indices are non-contiguous the compiler could generate a statically-optimized hashmap (which would still be non-worse than a dynamically constructed hashmap). - In general the compiler and your running program would have a similar amount of information, but the switch statement will be able to process this information at compile time and optimize using some heuristics that are most likely way better than anything we could ever achieve at runtime.
  • Alex Huszagh
    Alex Huszagh over 6 years
    @peoro If the data is contiguous though, you could always use a static array as well: if the data is non-contiguous, although you're unlikely to notice major differences, at very large sizes O(1) with a decent large coefficient starts to beat O(log(n)) with a small coefficient. You're mostly right, but the example above is clearly not contiguous.