What's the largest size a hashtable should be?

12,064

Well there are specialized data structures and algorithms for this kind of data. For example the Patricia Trie or the Radix Tree that is far more space efficient than an hash table for strings, but of course, being a tree, lookup computational complexity is O(log n) and building it is O(n log n). Since you are loding it from a file however you can write your file in such a way you can load it in O(n).

Hashtable (Dictionary) in C# is implemented in such a way it have not an upper bound except it uses internal 32 bit integer addressing (it cannot have more than 2 billions of items for sure).

100000 items are not too much for a dictionary. More problematic for languages with garbage collector maybe is that you will have 100000 allocated strings, some pressure for your GC. You can obtain more information on the real application memory footprint only running it.

If memory is a real concern, look for Patricia Trie and Radix Tree, perfect to store word dictionaries. But you can start using a dictionary and see how much memory your application get.

Doing a rough calculation, considering strings as unicode, and considering that the average word in english is 5.1 letters (i read on the web) and considering plus 32 bytes (for object and length) for each string you will get a minimum amount of memory of (100000 * (32 + 5 * 2)) memory for strings of 4200000 bytes, that is a really small amount.

Share:
12,064
Jimmy Zelinskie
Author by

Jimmy Zelinskie

Updated on June 04, 2022

Comments

  • Jimmy Zelinskie
    Jimmy Zelinskie almost 2 years

    How large is too large for the average programming language's implementation of a hashtable?

    Say I wanted to create a program that plays the game Shiritori. After the user inputs a word, the program needs to look up into a dictionary if that word exists. To prevent constant flat-file reads, is loading 100,000+ words into a hashtable at the program start a wise solution?