C++11 std::set lambda comparison function

61,883

Solution 1

Yes, a std::function introduces nearly unavoidable indirection to your set. While the compiler can always, in theory, figure out that all use of your set's std::function involves calling it on a lambda that is always the exact same lambda, that is both hard and extremely fragile.

Fragile, because before the compiler can prove to itself that all calls to that std::function are actually calls to your lambda, it must prove that no access to your std::set ever sets the std::function to anything but your lambda. Which means it has to track down all possible routes to reach your std::set in all compilation units and prove none of them do it.

This might be possible in some cases, but relatively innocuous changes could break it even if your compiler managed to prove it.

On the other hand, a functor with a stateless operator() has easy to prove behavior, and optimizations involving that are everyday things.

So yes, in practice I'd suspect std::function could be slower. On the other hand, std::function solution is easier to maintain than the make_set one, and exchanging programmer time for program performance is pretty fungible.

make_set has the serious disadvantage that any such set's type must be inferred from the call to make_set. Often a set stores persistent state, and not something you create on the stack then let fall out of scope.

If you created a static or global stateless lambda auto MyComp = [](A const&, A const&)->bool { ... }, you can use the std::set<A, decltype(MyComp)> syntax to create a set that can persist, yet is easy for the compiler to optimize (because all instances of decltype(MyComp) are stateless functors) and inline. I point this out, because you are sticking the set in a struct. (Or does your compiler support

struct Foo {
  auto mySet = make_set<int>([](int l, int r){ return l<r; });
};

which I would find surprising!)

Finally, if you are worried about performance, consider that std::unordered_set is much faster (at the cost of being unable to iterate over the contents in order, and having to write/find a good hash), and that a sorted std::vector is better if you have a 2-phase "insert everything" then "query contents repeatedly". Simply stuff it into the vector first, then sort unique erase, then use the free equal_range algorithm.

Solution 2

It's unlikely that the compiler will be able to inline a std::function call, whereas any compiler that supports lambdas would almost certainly inline the functor version, including if that functor is a lambda not hidden by a std::function.

You could use decltype to get the lambda's comparator type:

#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>

int main()
{
   auto comp = [](int x, int y){ return x < y; };
   auto set  = std::set<int,decltype(comp)>( comp );

   set.insert(1);
   set.insert(10);
   set.insert(1); // Dupe!
   set.insert(2);

   std::copy( set.begin(), set.end(), std::ostream_iterator<int>(std::cout, "\n") );
}

Which prints:

1
2
10

See it run live on Coliru.

Solution 3

A stateless lambda (i.e. one with no captures) can decay to a function pointer, so your type could be:

std::set<int, bool (*)(int, int)> numbers;

Otherwise I'd go for the make_set solution. If you won't use a one-line creation function because it's non-standard you're not going to get much code written!

Solution 4

From my experience playing around with the profiler, the best compromise between performance and beauty is to use a custom delegate implementation, such as:

https://codereview.stackexchange.com/questions/14730/impossibly-fast-delegate-in-c11

As the std::function is usually a bit too heavy. I can't comment on your specific circumstances, as I don't know them, though.

Solution 5

If you're determined to have the set as a class member, initializing its comparator at constructor time, then at least one level of indirection is unavoidable. Consider that as far as the compiler knows, you could add another constructor:

 Foo () : numbers ([](int x, int y)
                   {
                       return x < y;
                   })
 {
 }

 Foo (char) : numbers ([](int x, int y)
                   {
                       return x > y;
                   })
 {
 }

Once the you have an object of type Foo, the type of the set doesn't carry information on which constructor initialized its comparator, so to call the correct lambda requires an indirection to the run-time selected lambda operator().

Since you're using captureless lambdas, you could use the function pointer type bool (*)(int, int) as your comparator type, as captureless lambdas have the appropriate conversion function. This would of course involve an indirection through the function pointer.

Share:
61,883

Related videos on Youtube

cfa45ca55111016ee9269f0a52e771
Author by

cfa45ca55111016ee9269f0a52e771

Updated on April 27, 2020

Comments

  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 about 4 years

    I want to create a std::set with a custom comparison function. I could define it as a class with operator(), but I wanted to enjoy the ability to define a lambda where it is used, so I decided to define the lambda function in the initialization list of the constructor of the class which has the std::set as a member. But I can't get the type of the lambda. Before I proceed, here's an example:

    class Foo
    {
    private:
         std::set<int, /*???*/> numbers;
    public:
         Foo () : numbers ([](int x, int y)
                           {
                               return x < y;
                           })
         {
         }
    };
    

    I found two solutions after searching: one, using std::function. Just have the set comparison function type be std::function<bool (int, int)> and pass the lambda exactly like I did. The second solution is to write a make_set function, like std::make_pair.

    SOLUTION 1:

    class Foo
    {
    private:
         std::set<int, std::function<bool (int, int)> numbers;
    public:
         Foo () : numbers ([](int x, int y)
                           {
                               return x < y;
                           })
         {
         }
    };
    

    SOLUTION 2:

    template <class Key, class Compare>
    std::set<Key, Compare> make_set (Compare compare)
    {
         return std::set<Key, Compare> (compare);
    }
    

    The question is, do I have a good reason to prefer one solution over the other? I prefer the first one because it makes use of standard features (make_set is not a standard function), but I wonder: does using std::function make the code (potentially) slower? I mean, does it lower the chance the compiler inlines the comparison function, or it should be smart enough to behave exactly the same like it would it was a lambda function type and not std::function (I know, in this case it can't be a lambda type, but you know, I'm asking in general) ?

    (I use GCC, but I'd like to know what popular compilers do in general)

    SUMMARY, AFTER I GOT LOTS OF GREAT ANSWERS:

    If speed is critical, the best solution is to use an class with operator() aka functor. It's easiest for the compiler to optimize and avoid any indirections.

    For easy maintenance and a better general-purpose solution, using C++11 features, use std::function. It's still fast (just a little bit slower than the functor, but it may be negligible) and you can use any function - std::function, lambda, any callable object.

    There's also an option to use a function pointer, but if there's no speed issue I think std::function is better (if you use C++11).

    There's an option to define the lambda function somewhere else, but then you gain nothing from the comparison function being a lambda expression, since you could as well make it a class with operator() and the location of definition wouldn't be the set construction anyway.

    There are more ideas, such as using delegation. If you want a more thorough explanation of all solutions, read the answers :)

    • Admin
      Admin over 11 years
      I smell premature optimization.
    • Kerrek SB
      Kerrek SB over 11 years
      Why not just bool(*)(int, int)? But it might be more efficient to make an explicit, default-constructible predicate class.
    • cfa45ca55111016ee9269f0a52e771
      cfa45ca55111016ee9269f0a52e771 over 11 years
      @Fanael how would you know, what if I have a long set of objects rendered by GUI and I really needed it to be as fast as possible
    • Admin
      Admin over 11 years
      @fr33domlover: isn't the cost of std::function dwarfed by the cost of rendering in that case?
    • cfa45ca55111016ee9269f0a52e771
      cfa45ca55111016ee9269f0a52e771 over 11 years
      @Fanael if sorting is done while there's no rendering, I can still get more speed by making the sorting faster and giving the rendering code more execution time. Anyway, even if this is premature optimization, the question is still useful: look at the answers and upvotes...
    • user1095108
      user1095108 over 11 years
      I've added lambda support to the codereview article. Try it!
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    I use GCC, but I'd like to know what popular compilers do in general. The possible indirection is the reason I don't simply pick the std::function solution
  • filmor
    filmor over 11 years
    stackoverflow.com/questions/8298780/…, that one might be interesting.
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    Hmmm... it says compilers still don't fully treat simple cases of std::function
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    You can only use decltype after defining the lambda, but then I lose the ability to define the lambda in the constructor itself, so I could as well use a class with operator(). I want to define the comparison function in the constructor
  • metal
    metal over 11 years
    See above. You could make it a static member of your class.
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    Comment to the edit (code example): look at the code example, it's different. What you suggest doesn't work because a lambda can't be used in unevaluated context + the type can't be deduced correctly because it's compiler-generated for each separate lambda you create. Also I found relevant SO questions, they say exactly that. Otherwise I'd just use the lambda...
  • metal
    metal over 11 years
    Right. Was just testing it out. Deleted that bit.
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    Looks like an excellent general-purpose solution, but I just need my little case with std::set to I prefer to just use make_set which is a "special case" of the general-purpose delegate class. But in general it's very interesting, maybe can solve all these lambda issues
  • Yakk - Adam Nevraumont
    Yakk - Adam Nevraumont over 11 years
    Such a delgate still has a layer of opaque indirection (ie, the equivalent of a pointer dereference) over a stateless functor. The ability to use stateless functors is one of the big reasons why std::sort outperforms qsort.
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    Oh, if it has non-inlined indirection I can as well use std::function and get the same speed...
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    You're right I could make it a static member, but I could just write a nested class with operator()... no big difference
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    Interesting, I didn't know it casts to a function pointer... As to standatd/non-standard, I meant that if I rely on std::function being better implemented in the future, then future versions of compilers will make it as fast as the lambda, inlined, without me changing the code
  • Jonathan Wakely
    Jonathan Wakely over 11 years
    I see. There's a limit to how efficient std::function can get, but have you actually measured to see if there's a performance problem before you worry about it? std:function can be pretty damn fast: timj.testbit.eu/2013/01/25/cpp11-signal-system-performance
  • metal
    metal over 11 years
    Good advice about alternate containers, but beware the hashing function for some other types (ints will work fine). That can kill performance if not done right -- either by being too slow or by having too many collisions.
  • cfa45ca55111016ee9269f0a52e771
    cfa45ca55111016ee9269f0a52e771 over 11 years
    I think I missed that point with make_set, it wouldn't work with lambdas. That leaves only the std::function solution, which does involve indirection but currently I don't have performance problems with it. Also the vector is very interesting, the set contents are read and rendered by GUI, so rendering happens much more often that users change the contents (which happens rarely)... maybe sorting a vector on each change will actually be faster than key lookup
  • David Rodríguez - dribeas
    David Rodríguez - dribeas over 11 years
    @fr33domlover: Your assumption is not necessarily true. The definition of std::function requires type erasure on the actual callable entity that is held, and that pretty much requires an indirection. Even in the case of a plain function pointer there's going to be more cost than if you create a functor for that particular purpose. This is exactly the reason why std::sort is faster than C's qsort.
  • user1095108
    user1095108 over 11 years
    I may be inept with gdb, but it seemed to me, that the compiler often eliminated all dereferencing when I've used that delegate with -O3.
  • Yakk - Adam Nevraumont
    Yakk - Adam Nevraumont over 11 years
    The auto functor = [](...){...}; syntax has the advantage of being shorter than struct functor { bool operator()(...)const{...}; }; syntax, and the disadvantage of requiring the functor instance in order to call it (as opposed to any default constructed functor for the struct case).
  • AzP
    AzP almost 11 years
    Sadly this doesn't seem to work on VS2012.. For some reason, the decltype for the lambda isn't implemented yet. Using a function<> as the compare-type works though.
  • Euri Pinhollow
    Euri Pinhollow about 7 years
    @cfa45ca55111016ee9269f0a52e771 The lambda type only depends on return type and arguments types. You may use any lambda with same returns and args. You may use a small sample lambda to use decltype on it: decltype([](bool A, bool B){return bool(1)})`.