Hashmap equivalent in C++

java c++ hashmap unordered-map

12,593

Solution 1

you can use unordered_map<string,int>.

Solution 2

Depending on the amount of data you want to store, there are two possibilities:

For a semi-large amount of data, I think a std::unordered_map<string, int> will do just fine
If you want to work with really large amounts of data, it could be helpful to think about more dedicated data structures for string storage, e.g. tries, where strings with a common prefix get stored in a common subtree. This could also improve your space usage, since the data gets kind of compressed. The most efficient implementation I know of is the marisa-trie also used in the python pytries package.

12,593

Author by

DavidG

Applied Maths person that has a strong interest in Pure Mathematics. Enjoy working on problems in Abstract Algebra, Linear Algebra, and Integral and Differential Calculus. Outside of Mathematics, I'm a massive retro gaming nerd with a particular love of the Sega Master System, Sega Mega Drive, PC Engine, NEO GEO, NES, SNES, and N64. I have just created a FB page exclusively for Integrals (20th Feb, 2019). Anyone and everyone is welcome to join! https://www.facebook.com/groups/302069030454491/

Updated on June 04, 2022

Comments

DavidG almost 2 years
I have an application (in C++) in which I need to have a set of pairings between Strings and Integers, i.e:
```
("david", 0)
("james", 1)
("helen", 2)
... 
```
If we use the java (key, value) definition, I need to be able to (1) search to see if a key exists in the map and (2) retrieve the value associated with a given string (key).When working in java, I found that the HashMap type could handle everything that I needed.

I would like to do the same but in C++. I did some googling around and found that in the C++ 2011 library there is an unordered_map type that replicates this. I was curious as to whether this was the best approach.

In my application I have the following rules on the collection
1. The Integers are always sequential (as per the example) and start at 0.
2. The Integer Values never change.
3. The Map is created at the start of the application and doesn't change, i.e. it's Immutable.
4. There are no duplicates of the string keys.
5. Upon creation of the map, I don't know how many keys (and by extension integer values) I'm going to need to use. One of the parameters for my application is the directory of a text file which contains the list of words to be used.
6. I don't care about start up time costs associated with this. I need the primary task (i.e. containsKey(..) and get(key) to be as fast as possible). And it will be called A LOT. The application is centered on processing large text corpora (i.e. Wikipedia) and forming co-occurence matrices between words/documents.
I thought that instead of having both the integer and string stored, that instead store the strings in some list type and then return the index back, i.e. data = { "david", "james", "helen", ... }

and then something like find_Map(data, key) to return back the index(value) that it's located with. I thought this could be speed up by first sorting into ascending order and applying a searching alogrithm. But again, that's just a guess.

I do appreciate that this is a common problem and that many different approaches exist. I'm going to code up a few different ideas but thought it would be best to ask the group first to see what you guys thought.