C++ STL Map vs Vector speed

41,422

Solution 1

You effectively have a number of alternatives.

Libraries exist:

Critics

  • Map look up and retrieval take O(log N), but the items may be scattered throughout the memory, thus not playing well with caching strategies.
  • Vector are more cache friendly, however unless you sort it you'll have O(N) performance on find, is it acceptable ?
  • Why not using a unordered_map ? They provide O(1) lookup and retrieval (though the constant may be high) and are certainly suited to this task. If you have a look at Wikipedia's article on Hash Tables you'll realize that there are many strategies available and you can certainly pick one that will suit your particular usage pattern.

Solution 2

Normally you'd use a symbol table to look up the variable given its name as it appears in the source. In this case, you only have the name to work with, so there's nowhere to store the cached position of the variable in the symbol table. So I'd say a map is a good choice. The [] operator takes time proportional to the log of the number of elements in the map - if it turns out to be slow, you could use a hash map like std::tr1::unordered_map.

Solution 3

When most interpreters interpret code, they compile it into an intermediate language first. These intermediate languages often refer to variables by index or by pointer, instead of by name.

For example, Python (the C implementation) changes local variables into references by index, but global variables and class variables get referenced by name using a hash table.

I suggest looking at an introductory text on compilers.

Solution 4

std::map's operator[] takes O(log(n)) time. This means that it is quite efficient, but you still should avoid doing the lookups over and over again. Instead of storing an index, perhaps you can store a reference to the value, or an iterator to the container? This avoids having to do lookup entirely.

Solution 5

a std::map (O(log(n))) or a hashtable ("amortized" O(1)) would be the first choice - use custom mechanisms if you determin it's a bottleneck. Generally, using a hash or tokenizing the input is the first optimization.

Before you have profiled it, it's most important that you isolate lookup, so you can easily replace and profile it.


std::map is likely a tad slower for a small number of elements (but then, it doesn't really matter).

Share:
41,422
sub
Author by

sub

Interested in the structure, paradigms and background of programming languages. Also interested in compilers and interpreters and appreciates the people who can build them. Loves compiled languages with dynamic typing. Compiled means translated to machine code for me, I don't see languages that translate to bytecode which then runs on a VM as compiled.

Updated on July 09, 2022

Comments

  • sub
    sub almost 2 years

    In the interpreter for my experimental programming language I have a symbol table. Each symbol consists of a name and a value (the value can be e.g.: of type string, int, function, etc.).

    At first I represented the table with a vector and iterated through the symbols checking if the given symbol name fitted.

    Then I though using a map, in my case map<string,symbol>, would be better than iterating through the vector all the time but:

    It's a bit hard to explain this part but I'll try.

    If a variable is retrieved the first time in a program in my language, of course its position in the symbol table has to be found (using vector now). If I would iterate through the vector every time the line gets executed (think of a loop), it would be terribly slow (as it currently is, nearly as slow as microsoft's batch).

    So I could use a map to retrieve the variable: SymbolTable[ myVar.Name ]

    But think of the following: If the variable, still using vector, is found the first time, I can store its exact integer position in the vector with it. That means: The next time it is needed, my interpreter knows that it has been "cached" and doesn't search the symbol table for it but does something like SymbolTable.at( myVar.CachedPosition ).

    Now my (rather hard?) question:

    • Should I use a vector for the symbol table together with caching the position of the variable in the vector?

    • Should I rather use a map? Why? How fast is the [] operator?

    • Should I use something completely different?

  • sub
    sub about 14 years
    You don't know my code, I have a very good option to store the position, trust me
  • Ben Voigt
    Ben Voigt about 14 years
    Then store a pointer (e.g. symbol*), and bypass the container lookup completely.
  • Mike Dinsdale
    Mike Dinsdale about 14 years
    You can certainly store the position in your parse tree or some other intermediate representation, but you can do the same thing if you're using a map as Tronic says in his/her answer. The big difference is just the time to look up the variable name. (Actually other issues pop up if you want to serialize your data structures but I'll stop speculating about your code here ;))
  • Randy the Dev
    Randy the Dev almost 7 years
    If you keep the vector sorted (penalty on insertion) then find will have O(log N) performance.
  • John
    John almost 3 years
    My vector is sorted, how to achieve O(log N) performance?
  • Matthieu M.
    Matthieu M. almost 3 years
    @John: You need to use a binary search algorithm. In C++ this can be achieved using the std::lower_bound and std::upper_bound algorithms (either one works).
  • John
    John almost 3 years
    @Matthieu M. It seems that std::vector::lower_bound is slower than std::map::find even when the data set is small(300 ~ 5000). It's really unexpected. I did such tests on Ubuntu two hours ago. How do you think about it?
  • Matthieu M.
    Matthieu M. almost 3 years
    @John: I think you'd need to make a proper question, and show your code, compiler and compiler settings, etc... It is, indeed, unexpected.