Does std::unordered_set allow insertion of duplicate elements?
an unordered_set will not allow duplicate elements, based on their hash
No, unordered_set
avoids duplicates by comparing values, not the hashes of those values†.
The "values" of each of your shared pointers is going to differ because they refer to different objects.
You can actually change this behaviour by providing your own function as the KeyEqual
template parameter to unordered_set
:
template<
class Key,
class Hash = std::hash<Key>, // <-- you've been looking at this
class KeyEqual = std::equal_to<Key>, // <-- instead of this
class Allocator = std::allocator<Key>
> class unordered_set;
† If only one value with a given hash were allowed in an unordered_set
then (a) you'd be unable to add any values that genuinely resulted in hash collisions, and (b) the entire hash collision resolution mechanism would become entirely unnecessary.
Ælex
I'm an ML/AI researcher who sold his soul to the industry. I love working on RL, ML, CV using Python (PyTorch, Keras, TF, etc). Used to work on C++/ROS/OpenCV for Robotics. I'm not looking for a job, but I'm definitely interested in Startups.
Updated on June 05, 2022Comments
-
Ælex about 2 years
I am not understanding something right. I am under the impression that an unordered_set will not allow duplicate elements, based on their hash.
I have a struct, with a specialisation of std::hash, which appears to allow duplicates, although I have manually checked it
AddEdge ( const std::shared_ptr<Relation> relation, const std::shared_ptr<Concept> concept ) { auto edge = std::make_shared<Edge>( (Edge){ relation, concept } ); auto res = _edges.insert ( edge ); return res.second; }
An overloaded function does exactly the same but for reversed parameters
This is how struct Edge is hashed:
namespace std { template<> struct hash<shared_ptr<Edge>> { size_t operator()( const shared_ptr<Edge> & ptr ) const { size_t from = 0; size_t to = 0; if ( auto node = std::dynamic_pointer_cast<Concept>( ptr->from ) ) from = hash<shared_ptr<Concept>>()( node ); else if ( auto node = std::dynamic_pointer_cast<Relation>( ptr->from ) ) from = hash<shared_ptr<Relation>>()( node ); if ( auto node = std::dynamic_pointer_cast<Concept>( ptr->to ) ) to = hash<shared_ptr<Concept>>()( node ); else if ( auto node = std::dynamic_pointer_cast<Relation>( ptr->to ) ) to = hash<shared_ptr<Relation>>()( node ); return hash<size_t>()( from + to ); } }; }
And the container held in:
std::unordered_set<std::shared_ptr<Edge>> _edges;
When I do:
graph2->AddEdge( sea_node, is_node ); graph2->AddEdge( is_node, blue_node );
I get:
Edge [sea,is] hash = 10017731961838884781 Edge [is,blue] hash = 11178184051384786808
I try a second time exactly the same, and I get the same hashes, yet, when I check the edges, I now have 4 edges instead of 2.
What am I doing wrong?
EDIT: class Concept & Relation have the same kind of hash function:
namespace std { template<> struct hash<shared_ptr<Concept>> { size_t operator()( const shared_ptr<Concept> & ptr ) const { return hash<string>()( ptr->asToken()->value() ) + hash<int>()( ptr->TokenIndex() ) + hash<string>()( "Concept" ); } }; }
Even more interestignly, my output from when I add Edges, produces the same hash, and yet it the duplicate Edge is added.
-
Ælex over 9 yearsI may have to read this a few times to digest it. Thanks for the explanation. So, should I be implementing std::equal_to<Edge> or does that not make sense?
-
Lightness Races in Orbit over 9 years@Alex: I guess it would be
std::equal_to<std::shared_ptr<Edge>>
but I really think your code is going to be very confusing to read and use when you have all this transparent dereferencing going on. Up to you I suppose. At the very least I'd pass yourHash
andKeyEqual
functors as template arguments to your set type, instead of making these implementations generally available to all of your code. -
Anton Savin over 9 years@Alex Note that
equal_to
must be implemented consistently withhash
, that is ifequal_to(ptr1, ptr2)
thenhash(ptr1) == hash(ptr2)