Hashmap equivalent in C++
Solution 1
you can use unordered_map<string,int>
.
Solution 2
Depending on the amount of data you want to store, there are two possibilities:
- For a semi-large amount of data, I think a
std::unordered_map<string, int>
will do just fine - If you want to work with really large amounts of data, it could be helpful to think about more dedicated data structures for string storage, e.g. tries, where strings with a common prefix get stored in a common subtree. This could also improve your space usage, since the data gets kind of compressed. The most efficient implementation I know of is the marisa-trie also used in the python pytries package.
DavidG
Applied Maths person that has a strong interest in Pure Mathematics. Enjoy working on problems in Abstract Algebra, Linear Algebra, and Integral and Differential Calculus. Outside of Mathematics, I'm a massive retro gaming nerd with a particular love of the Sega Master System, Sega Mega Drive, PC Engine, NEO GEO, NES, SNES, and N64. I have just created a FB page exclusively for Integrals (20th Feb, 2019). Anyone and everyone is welcome to join! https://www.facebook.com/groups/302069030454491/
Updated on June 04, 2022Comments
-
DavidG almost 2 years
I have an application (in C++) in which I need to have a set of pairings between Strings and Integers, i.e:
("david", 0) ("james", 1) ("helen", 2) ...
If we use the java (key, value) definition, I need to be able to (1) search to see if a key exists in the map and (2) retrieve the value associated with a given string (key).When working in java, I found that the HashMap type could handle everything that I needed.
I would like to do the same but in C++. I did some googling around and found that in the C++ 2011 library there is an unordered_map type that replicates this. I was curious as to whether this was the best approach.
In my application I have the following rules on the collection
- The Integers are always sequential (as per the example) and start at 0.
- The Integer Values never change.
- The Map is created at the start of the application and doesn't change, i.e. it's Immutable.
- There are no duplicates of the string keys.
- Upon creation of the map, I don't know how many keys (and by extension integer values) I'm going to need to use. One of the parameters for my application is the directory of a text file which contains the list of words to be used.
- I don't care about start up time costs associated with this. I need the primary task (i.e. containsKey(..) and get(key) to be as fast as possible). And it will be called A LOT. The application is centered on processing large text corpora (i.e. Wikipedia) and forming co-occurence matrices between words/documents.
I thought that instead of having both the integer and string stored, that instead store the strings in some list type and then return the index back, i.e. data = { "david", "james", "helen", ... }
and then something like find_Map(data, key) to return back the index(value) that it's located with. I thought this could be speed up by first sorting into ascending order and applying a searching alogrithm. But again, that's just a guess.
I do appreciate that this is a common problem and that many different approaches exist. I'm going to code up a few different ideas but thought it would be best to ask the group first to see what you guys thought.