Does std::unordered_set allow insertion of duplicate elements?

10,030

an unordered_set will not allow duplicate elements, based on their hash

No, unordered_set avoids duplicates by comparing values, not the hashes of those values.
The "values" of each of your shared pointers is going to differ because they refer to different objects.

You can actually change this behaviour by providing your own function as the KeyEqual template parameter to unordered_set:

template<
    class Key,
    class Hash = std::hash<Key>,           // <-- you've been looking at this
    class KeyEqual = std::equal_to<Key>,   // <-- instead of this
    class Allocator = std::allocator<Key>
> class unordered_set;

If only one value with a given hash were allowed in an unordered_set then (a) you'd be unable to add any values that genuinely resulted in hash collisions, and (b) the entire hash collision resolution mechanism would become entirely unnecessary.

Share:
10,030
Ælex
Author by

Ælex

I'm an ML/AI researcher who sold his soul to the industry. I love working on RL, ML, CV using Python (PyTorch, Keras, TF, etc). Used to work on C++/ROS/OpenCV for Robotics. I'm not looking for a job, but I'm definitely interested in Startups.

Updated on June 05, 2022

Comments

  • Ælex
    Ælex about 2 years

    I am not understanding something right. I am under the impression that an unordered_set will not allow duplicate elements, based on their hash.

    I have a struct, with a specialisation of std::hash, which appears to allow duplicates, although I have manually checked it

    AddEdge ( const std::shared_ptr<Relation> relation, const std::shared_ptr<Concept> concept )
    {
        auto edge = std::make_shared<Edge>( (Edge){ relation, concept } );
        auto res = _edges.insert ( edge );
        return res.second;
    }
    

    An overloaded function does exactly the same but for reversed parameters

    This is how struct Edge is hashed:

    namespace std
    {
        template<> struct hash<shared_ptr<Edge>>
        {
            size_t operator()( const shared_ptr<Edge> & ptr ) const
            {
                size_t from = 0;
                size_t to = 0;
    
                if ( auto node = std::dynamic_pointer_cast<Concept>( ptr->from ) )
                    from = hash<shared_ptr<Concept>>()( node );
    
                else if ( auto node = std::dynamic_pointer_cast<Relation>( ptr->from ) )
                    from = hash<shared_ptr<Relation>>()( node );
    
                if ( auto node = std::dynamic_pointer_cast<Concept>( ptr->to ) )
                     to = hash<shared_ptr<Concept>>()( node );
    
                else if ( auto node = std::dynamic_pointer_cast<Relation>( ptr->to ) )
                     to =  hash<shared_ptr<Relation>>()( node );
    
                return hash<size_t>()( from + to );
            }
        };
    }
    

    And the container held in:

    std::unordered_set<std::shared_ptr<Edge>> _edges;
    

    When I do:

    graph2->AddEdge( sea_node, is_node );
    graph2->AddEdge( is_node, blue_node );
    

    I get:

    Edge [sea,is] hash = 10017731961838884781
    Edge [is,blue] hash = 11178184051384786808
    

    I try a second time exactly the same, and I get the same hashes, yet, when I check the edges, I now have 4 edges instead of 2.

    What am I doing wrong?

    EDIT: class Concept & Relation have the same kind of hash function:

    namespace std
    {
        template<> struct hash<shared_ptr<Concept>>
        {
            size_t operator()( const shared_ptr<Concept> & ptr ) const
            {
                return hash<string>()( ptr->asToken()->value() ) + hash<int>()( ptr->TokenIndex() ) + hash<string>()( "Concept" );
            }
        };
    }
    

    Even more interestignly, my output from when I add Edges, produces the same hash, and yet it the duplicate Edge is added.

  • Ælex
    Ælex over 9 years
    I may have to read this a few times to digest it. Thanks for the explanation. So, should I be implementing std::equal_to<Edge> or does that not make sense?
  • Lightness Races in Orbit
    Lightness Races in Orbit over 9 years
    @Alex: I guess it would be std::equal_to<std::shared_ptr<Edge>> but I really think your code is going to be very confusing to read and use when you have all this transparent dereferencing going on. Up to you I suppose. At the very least I'd pass your Hash and KeyEqual functors as template arguments to your set type, instead of making these implementations generally available to all of your code.
  • Anton Savin
    Anton Savin over 9 years
    @Alex Note that equal_to must be implemented consistently with hash, that is if equal_to(ptr1, ptr2) then hash(ptr1) == hash(ptr2)