Super high performance C/C++ hash map (table, dictionary)

84,047

Solution 1

I would recommend you to try Google SparseHash (or the C11 version Google SparseHash-c11) and see if it suits your needs. They have a memory efficient implementation as well as one optimized for speed. I did a benchmark a long time ago, it was the best hashtable implementation available in terms of speed (however with drawbacks).

Solution 2

Just use boost::unordered_map (or tr1 etc) by default. Then profile your code and see if that code is the bottleneck. Only then would I suggest to precisely analyze your requirements to find a faster substitute.

Solution 3

What libraries in C or C++ have a data structure that fits this use case? Or, how would you recommend building your own? Thanks!

Check out the LGPL'd Judy arrays. Never used myself, but was advertised to me on few occasions.

You can also try to benchmark STL containers (std::hash_map, etc). Depending on platform/implementation and source code tuning (preallocate as much as you can dynamic memory management is expensive) they could be performant enough.

Also, if performance of the final solution trumps the cost of the solution, you can try to order the system with sufficient RAM to put everything into plain arrays. Performance of access by index is unbeatable.

The add/delete operations are much (100x) more frequent than the get operation.

That hints that you might want to concentrate on improving algorithms first. If data are only written, not read, then why write them at all?

Solution 4

If you have a multithreaded program, you can find some useful hash tables in intel thread building blocks library. For example, tbb::concurrent_unordered_map has the same api as std::unordered_map, but it's main functions are thread safe.

Also have a look at facebook's folly library, it has high performance concurrent hash table and skip list.

Solution 5

khash is very efficient. There is author's detailed benchmark: https://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/ and it also shows khash beats many other hash libraries.

Share:
84,047
Haywood Jablomey
Author by

Haywood Jablomey

Updated on July 17, 2020

Comments

  • Haywood Jablomey
    Haywood Jablomey almost 4 years

    I need to map primitive keys (int, maybe long) to struct values in a high-performance hash map data structure.

    My program will have a few hundred of these maps, and each map will generally have at most a few thousand entries. However, the maps will be "refreshing" or "churning" constantly; imagine processing millions of add and delete messages a second.

    What libraries in C or C++ have a data structure that fits this use case? Or, how would you recommend building your own? Thanks!

  • Haywood Jablomey
    Haywood Jablomey almost 14 years
    Can you elaborate on what the drawbacks were?
  • Scharron
    Scharron almost 14 years
    IIRC, it was a memory problem, when removing an element, the element was destructed but its memory was still alive (used as a cache I guess).
  • Admin
    Admin almost 14 years
    @Haywood Jablomey: The main drawback is that it requires you to split off one or two (if you ever erase elements) values and never use those. In some cases this is easy to do, e.g. negative ints or like that, but in other cases not quite so.
  • Cameron
    Cameron about 9 years
    It is. VS2013's std::unordered_map is taking 90+% of my entire execution time, even though I only use the maps for a relatively small part of the processing.
  • einpoklum
    einpoklum about 6 years
    Would you stand by this recommendation today?
  • underscore_d
    underscore_d over 5 years
    See also this, which discusses how the bucket interface also requires chaining. The point about references is very good. It's tempting to argue & say it's a useful guarantee, but in many cases we only want references to avoid looking up elements again, & the usual reason is because lookup is too slow... which it wouldn't be if it didn't have to keep references valid and hence could use open addressing! So it seems a bit chicken-and-egg. This cites the 2003 proposal, explicitly discussing the choice.