Performant Haskell hashed structure.
Solution 1
1: What are the major differences, if any?
-
Data.Map.Map
is a balanced binary tree internally, so its time complexity for lookups is O(log n). I believe it's a "persistent" data structure, meaning it's implemented such that mutative operations yield a new copy with only the relevant parts of the structure updated. -
Data.HashMap.Map
is aData.IntMap.IntMap
internally, which in turn is implemented as Patricia tree; its time complexity for lookups is O(min(n, W)) where W is the number of bits in an integer. It is also "persistent.". New versions (>= 0.2) use hash array mapped tries. According to the documentation: "Many operations have a average-case complexity of O(log n). The implementation uses a large base (i.e. 16) so in practice these operations are constant time." -
Data.HashTable.HashTable
is an actual hash table, with time complexity O(1) for lookups. However, it is a mutable data structure -- operations are done in-place -- so you're stuck in theIO
monad if you want to use it.
2: Which would be the most performant with a high volume of lookups on maps/tables of ~4000 key-value pairs?
The best answer I can give you, unfortunately, is "it depends." If you take the asymptotic complexities literally, you get O(log 4000) = about 12 for Data.Map
, O(min(4000, 64)) = 64 for Data.HashMap
and O(1) = 1 for Data.HashTable
. But it doesn't really work that way... You have to try them in the context of your code.
Solution 2
The obvious difference between Data.Map
and Data.HashMap
is that the former needs keys in Ord
, the latter Hashable
keys. Most of the common keys are both, so that's not a deciding criterion. I have no experience whatsoever with Data.HashTable
, so I can't comment on that.
The APIs of Data.HashMap
and Data.Map
are very similar, but Data.Map
exports more functions, some, like alter
are absent in Data.HashMap
, others are provided in strict and non-strict variants, while Data.HashMap
(I assume you meant the hashmap from unordered-containers) provides lazy and strict APIs in separate modules. If you are using only the common part of the API, switching is really painless.
Concerning performance, Data.HashMap
of unordered-containers has pretty fast lookup, last I measured, it was clearly faster than Data.IntMap
or Data.Map
, that holds in particular for the (not yet released) HAMT branch of unordered-containers. I think for inserts, it was more or less on par with Data.IntMap
and somewhat faster than Data.Map
, but I'm a bit fuzzy on that.
Both are sufficiently performant for most tasks, for those tasks where they aren't, you'll probably need a tailor-made solution anyway. Considering that you ask specifically about lookups, I would give Data.HashMap
the edge.
Solution 3
Data.HashTable
's documentation now says "use the hashtables package". There's a nice blog post explaining why hashtables is a good package here. It uses the ST monad.
Comments
-
providence about 2 years
I am writing program that does alot of table lookups. As such, I was perusing the Haskell documentation when I stumbled upon
Data.Map
(of course), but alsoData.HashMap
andData.Hashtable
. I am no expert on hashing algorithms and after inspecting the packages they all seem really similar. As such I was wondering:1: what are the major differences, if any?
2: Which would be the most performant with a high volume of lookups on maps/tables of ~4000 key-value pairs?
-
Thomas M. DuBuisson over 12 yearsBe sure you are talking about the same
Data.Hashmap
. Unordered-containers was measureably faster than other package'sHashMap
offerings, last I checked. -
providence over 12 yearsThanks, I'm in the IO monad at this point anyways due to lots of file IO. The result of all of these IO and hash lookups is yielded from a coroutine, so it'll be easy to escape IO afterwards. I think I'll give Data.HashTable a shot and see how it does. Thank you very much for your help~
-
mergeconflict over 12 yearsSure thing; also check out
Data.HashTable.ST
from the hashtables package, noted first above by @FUZxxl. -
Daniel Fischer over 12 yearsGood point. I had completely forgotten there was a previous
Data.HashMap
. -
gatoatigrado almost 11 yearsthe time complexity
O(min(n, W))
for a Patricia tree doesn't seem right ... might as well use a list then, no? -
gatoatigrado almost 11 yearsoops, I'm wrong, the worst case is linear. Also, the Patricia tree comment is in Data.HashMap.Strict (type HashMap) it seems ...
-
is7s almost 11 yearsThe
hashtables
package uses theST
monad not theState
monad. -
gatoatigrado almost 11 yearsright,
ST
is just the transformer (i.e. useful) version ofState
. -
is7s almost 11 yearsNo it's not. There's a huge conceptual difference. The
ST
monad carries the state at the type level while theState
monad carries the state at the value level :) -
sinelaw about 9 yearsIn 2015, there isn't anymore Data.HashMap in base. There is one in the hashtables package
-
rohitwtbs over 5 years@mergeconflict Hi, where can i read the internal implemenation of data structures like hashmap in detail in haskell?
-
Gal over 4 years@rohitwtbs, you can checkout most sources in either hackage or github. For example, unordered-containers-0.2.10.0