Cost of using std::map with std::string keys vs int keys?

21,820

Solution 1

In addition to the time complexity from comparing strings already mentioned, a string key will also cause an additional memory allocation each time an item is added to the container. In certain cases, e.g. highly parallel systems, a global allocator mutex can be a source of performance problems.

In general, you should choose the alternative that makes the most sense in your situation, and only optimize based on actual performance testing. It's notoriously hard to judge what will be a bottleneck.

Solution 2

Analyzing algorithms for asymptotic performance is working on the operations that must be performed and the cost they add to the equation. For that you need to first know what are the performed operations and then evaluate its costs.

Searching for a key in a balanced binary tree (which maps happen to be) require O( log N ) complex operations. Each of those operations implies comparing the key for a match and following the appropriate pointer (child) if the key did not match. This means that the overall cost is proportional to log N times the cost of those two operations. Following pointers is a constant time operation O(1), and comparing keys depend on the key. For an integer key, comparisons are fast O(1). Comparing two strings is another story, it takes time proportional to the sizes of the strings involved O(L) (where I have used intentionally L as the length of string parameter instead of the more common N.

When you sum all the costs up you get that using integers as keys the total cost is O( log N )*( O(1) + O(1) ) that is equivalent to O( log N ). (O(1) gets hidden in the constant that the O notation silently hides.

If you use strings as keys, the total cost is O( log N )*( O(L) + O(1) ) where the constant time operation gets hidden by the more costly linear operation O(L) and can be converted into O( L * log N ). That is, the cost of locating an element in a map keyed by strings is proportional to the logarithm of the number of elements stored in the map times the average length of the strings used as keys.

Note that the big-O notation is most appropriate to use as an analysis tool to determine how the algorithm will behave when the size of the problem grows, but it hides many facts underneath that are important for raw performance.

As the simplest example, if you change the key from a generic string to an array of 1000 characters you can hide that cost within the constant dropped out of the notation. Comparing arrays of 1000 chars is a constant operation that just happens to take quite a bit of time. With the asymptotic notation that would just be a O( log N ) operation, as with integers.

The same happens with many other hidden costs, as the cost of creation of the elements that is usually considered as a constant time operation, just because it does not depend on the parameters to your problem (the cost of locating the block of memory in each allocation does not depend on your data set, but rather on memory fragmentation that is outside of the scope of the algorithm analysis, the cost of acquiring the lock inside malloc as to guarantee that not two processes try to return the same block of memory depends on the contention of the lock that depends itself number of processors, processes and how much memory requests they perform..., again out of the scope of the algorithm analysis). When reading costs in the big-O notation you must be conscious of what it really means.

Solution 3

The cost difference will be linked to the difference in cost between comparing two ints versus comparing two strings.

When comparing two strings, you have to dereference a pointer to get to the first chars, and compare them. If they are identical, you have to compare the second chars, and so on. If your strings have a long common prefix, this can slow down the process a bit. It is very unlikely to be as fast as comparing ints, though.

Solution 4

The cost is ofcourse that ints can be compared in real O(1) time whereas strings are compared in O(n) time (n being the maximal shared prefix). Also, the storage of strings consumes more space than that of integers. Other than these apparent differences, there's no major performance cost.

Share:
21,820
Goles
Author by

Goles

Software Craftsman. Extremely interested in Web Technologies and Mobile Game Development. Using the sweet Obj-C, the evil C++ and the elegant Lua. Have been using lot's of Ruby spices in the mix lately.

Updated on February 28, 2020

Comments

  • Goles
    Goles about 4 years

    I know that the individual map queries take a maximum of log(N) time. However I was wondering, I have seen a lot of examples that use strings as map keys. What is the performance cost of associating a std::string as a key to a map instead of an int for example ?

    std::map<std::string, aClass*> someMap; vs std::map<int, aClass*> someMap;

    Thanks!

  • MSalters
    MSalters over 14 years
    String comparison may be O(N) worst case, but the average case often is much better. In fact, for 2 random strings, it's O(1) !
  • Shafik Yaghmour
    Shafik Yaghmour about 11 years
    Can you provide a reference or some example for those numbers?