What is the difference between set and hashset in C++ STL?
Solution 1
hash_set
is an extension that is not part of the C++ standard. Lookups should be O(1) rather than O(log n) for set
, so it will be faster in most circumstances.
Another difference will be seen when you iterate through the containers. set
will deliver the contents in sorted order, while hash_set
will be essentially random (Thanks Lou Franco).
Edit: The C++11 update to the C++ standard introduced unordered_set
which should be preferred instead of hash_set
. The performance will be similar and is guaranteed by the standard. The "unordered" in the name stresses that iterating it will produce results in no particular order.
Solution 2
stl::set
is implemented as a binary search tree.
hashset
is implemented as a hash table.
The main issue here is that many people use stl::set
thinking it is a hash table with look-up of O(1), which it isn't, and doesn't have. It really has O(log(n)) for look-ups. Other than that, read about binary trees vs hash tables to get a better idea of the data structures.
Solution 3
Another thing to keep in mind is that with hash_set you have to provide the hash function, whereas a set only requires a comparison function ('<') which is easier to define (and predefined for native types).
Solution 4
I don't think anyone has answered the other part of the question yet.
The reason to use hash_set or unordered_set is the usually O(1) lookup time. I say usually because every so often, depending on implementation, a hash may have to be copied to a larger hash array, or a hash bucket may end up containing thousands of entries.
The reason to use a set is if you often need the largest or smallest member of a set. A hash has no order so there is no quick way to find the smallest item. A tree has order, so largest or smallest is very quick. O(log n) for a simple tree, O(1) if it holds pointers to the ends.
Solution 5
A hash_set would be implemented by a hash table, which has mostly O(1) operations, whereas a set is implemented by a tree of some sort (AVL, red black, etc.) which have O(log n) operations, but are in sorted order.
Edit: I had written that trees are O(n). That's completely wrong.
Kristof Mertens
Updated on November 12, 2020Comments
-
Kristof Mertens over 3 years
When should I choose one over the other? Are there any pointers that you would recommend for using the right STL containers?