Why is there no transform_if in the C++ standard library?

27,277

Solution 1

The standard library favours elementary algorithms.

Containers and algorithms should be independent of each other if possible.

Likewise, algorithms that can be composed of existing algorithms are only rarely included, as shorthand.

If you require a transform if, you can trivially write it. If you want it /today/, composing of ready-mades and not incur overhead, you can use a range library that has lazy ranges, such as Boost.Range, e.g.:

v | filtered(arg1 % 2) | transformed(arg1 * arg1 / 7.0)

As @hvd points out in a comment, transform_if double result in a different type (double, in this case). Composition order matters, and with Boost Range you could also write:

 v | transformed(arg1 * arg1 / 7.0) | filtered(arg1 < 2.0)

resulting in different semantics. This drives home the point:

it makes very little sense to include std::filter_and_transform, std::transform_and_filter, std::filter_transform_and_filter etc. etc. into the standard library.

See a sample Live On Coliru

#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>

using namespace boost::adaptors;

// only for succinct predicates without lambdas
#include <boost/phoenix.hpp>
using namespace boost::phoenix::arg_names;

// for demo
#include <iostream>

int main()
{
    std::vector<int> const v { 1,2,3,4,5 };

    boost::copy(
            v | filtered(arg1 % 2) | transformed(arg1 * arg1 / 7.0),
            std::ostream_iterator<double>(std::cout, "\n"));
}

Solution 2

The new for loop notation in many ways reduces the need for algorithms that access every element of the collection where it is now cleaner to just write a loop and put the logic inplace.

std::vector< decltype( op( begin(coll) ) > output;
for( auto const& elem : coll )
{
   if( pred( elem ) )
   {
        output.push_back( op( elem ) );
   }
}

Does it really provide a lot of value now to put in an algorithm? Whilst yes, the algorithm would have been useful for C++03 and indeed I had one for it, we don't need one now so no real advantage in adding it.

Note that in practical use your code won't always look exactly like that either: you don't necessarily have functions "op" and "pred" and may have to create lambdas to make them "fit" into algorithms. Whilst it is nice to separate out concerns if the logic is complex, if it is just a matter of extracting a member from the input type and checking its value or adding it to the collection, it's a lot simpler once again than using an algorithm.

In addition, once you are adding some kind of transform_if, you have to decide whether to apply the predicate before or after the transform, or even have 2 predicates and apply it in both places.

So what are we going to do? Add 3 algorithms? (And in the case that the compiler could apply the predicate on either end of the convert, a user could easily pick the wrong algorithm by mistake and the code still compile but produce wrong results).

Also, if the collections are large, does the user want to loop with iterators or map/reduce? With the introduction of map/reduce you get even more complexities in the equation.

Essentially, the library provides the tools, and the user is left here to use them to fit what they want to do, not the other way round as was often the case with algorithms. (See how the user above tried to twist things using accumulate to fit what they really wanted to do).

For a simple example, a map. For each element I will output the value if the key is even.

std::vector< std::string > valuesOfEvenKeys
    ( std::map< int, std::string > const& keyValues )
{
    std::vector< std::string > res;
    for( auto const& elem: keyValues )
    {
        if( elem.first % 2 == 0 )
        {
            res.push_back( elem.second );
        }
    }
    return res;
}         

Nice and simple. Fancy fitting that into a transform_if algorithm?

Solution 3

Sorry to resurrect this question after so long. I had a similar requirement recently. I solved it by writing a version of back_insert_iterator that takes a boost::optional:

template<class Container>
struct optional_back_insert_iterator
: public std::iterator< std::output_iterator_tag,
void, void, void, void >
{
    explicit optional_back_insert_iterator( Container& c )
    : container(std::addressof(c))
    {}

    using value_type = typename Container::value_type;

    optional_back_insert_iterator<Container>&
    operator=( const boost::optional<value_type> opt )
    {
        if (opt) {
            container->push_back(std::move(opt.value()));
        }
        return *this;
    }

    optional_back_insert_iterator<Container>&
    operator*() {
        return *this;
    }

    optional_back_insert_iterator<Container>&
    operator++() {
        return *this;
    }

    optional_back_insert_iterator<Container>&
    operator++(int) {
        return *this;
    }

protected:
    Container* container;
};

template<class Container>
optional_back_insert_iterator<Container> optional_back_inserter(Container& container)
{
    return optional_back_insert_iterator<Container>(container);
}

used like this:

transform(begin(s), end(s),
          optional_back_inserter(d),
          [](const auto& s) -> boost::optional<size_t> {
              if (s.length() > 1)
                  return { s.length() * 2 };
              else
                  return { boost::none };
          });

Solution 4

After just finding this question again after some time, and devising a whole slew of potentially useful generic iterator adaptors I realized that the original question required NOTHING more than std::reference_wrapper.

Use it instead of a pointer, and you're good:

Live On Coliru

#include <algorithm>
#include <functional> // std::reference_wrapper
#include <iostream>
#include <vector>

struct ha {
    int i;
};

int main() {
    std::vector<ha> v { {1}, {7}, {1}, };

    std::vector<std::reference_wrapper<ha const> > ph; // target vector
    copy_if(v.begin(), v.end(), back_inserter(ph), [](const ha &parg) { return parg.i < 2; });

    for (ha const& el : ph)
        std::cout << el.i << " ";
}

Prints

1 1 

Solution 5

The standard is designed in such a way as to minimise duplication.

In this particular case you can achieve the algoritm's aims in a more readable and succinct way with a simple range-for loop.

// another way

vector<ha*> newVec;
for(auto& item : v) {
    if (item.i < 2) {
        newVec.push_back(&item);
    }
}

I have modified the example so that it compiles, added some diagnostics and presented both the OP's algorithm and mine side by side.

#include <vector>
#include <algorithm>
#include <iostream>
#include <iterator>

using namespace std;

struct ha { 
    explicit ha(int a) : i(a) {}
    int i;   // added this to solve compile error
};

// added diagnostic helpers
ostream& operator<<(ostream& os, const ha& t) {
    os << "{ " << t.i << " }";
    return os;
}

ostream& operator<<(ostream& os, const ha* t) {
    os << "&" << *t;
    return os;
}

int main() 
{
    vector<ha> v{ ha{1}, ha{7}, ha{1} }; // initial vector
    // GOAL : make a vector of pointers to elements with i < 2
    vector<ha*> ph; // target vector
    vector<ha*> pv; // temporary vector
    // 1. 
    transform(v.begin(), v.end(), back_inserter(pv), 
        [](ha &arg) { return &arg; }); 
    // 2. 
    copy_if(pv.begin(), pv.end(), back_inserter(ph),
        [](ha *parg) { return parg->i < 2;  }); // 2. 

    // output diagnostics
    copy(begin(v), end(v), ostream_iterator<ha>(cout));
    cout << endl;
    copy(begin(ph), end(ph), ostream_iterator<ha*>(cout));
    cout << endl;


    // another way

    vector<ha*> newVec;
    for(auto& item : v) {
        if (item.i < 2) {
            newVec.push_back(&item);
        }
    }

    // diagnostics
    copy(begin(newVec), end(newVec), ostream_iterator<ha*>(cout));
    cout << endl;
    return 0;
}
Share:
27,277
Nikos Athanasiou
Author by

Nikos Athanasiou

code monkey

Updated on January 04, 2022

Comments

  • Nikos Athanasiou
    Nikos Athanasiou over 2 years

    A use case emerged when wanting to do a contitional copy (1. doable with copy_if) but from a container of values to a container of pointers to those values (2. doable with transform).

    With the available tools I can't do it in less than two steps :

    #include <vector>
    #include <algorithm>
    
    using namespace std;
    
    struct ha { 
        int i;
        explicit ha(int a) : i(a) {}
    };
    
    int main() 
    {
        vector<ha> v{ ha{1}, ha{7}, ha{1} }; // initial vector
        // GOAL : make a vector of pointers to elements with i < 2
        vector<ha*> ph; // target vector
        vector<ha*> pv; // temporary vector
        // 1. 
        transform(v.begin(), v.end(), back_inserter(pv), 
            [](ha &arg) { return &arg; }); 
        // 2. 
        copy_if(pv.begin(), pv.end(), back_inserter(ph),
            [](ha *parg) { return parg->i < 2;  }); // 2. 
    
        return 0;
    }
    

    Ofcourse we could call remove_if on pv and eliminate the need for a temporary, better yet though, it's not difficult to implement (for unary operations) something like this :

    template <
        class InputIterator, class OutputIterator, 
        class UnaryOperator, class Pred
    >
    OutputIterator transform_if(InputIterator first1, InputIterator last1,
                                OutputIterator result, UnaryOperator op, Pred pred)
    {
        while (first1 != last1) 
        {
            if (pred(*first1)) {
                *result = op(*first1);
                ++result;
            }
            ++first1;
        }
        return result;
    }
    
    // example call 
    transform_if(v.begin(), v.end(), back_inserter(ph), 
    [](ha &arg) { return &arg;      }, // 1. 
    [](ha &arg) { return arg.i < 2; });// 2.
    
    1. Is there a more elegant workaround with the available C++ standard library tools ?
    2. Is there a reason why transform_if does not exist in the library? Is the combination of the existing tools a sufficient workaround and/or considered performance wise well behaved ?
  • Jan Hudec
    Jan Hudec about 10 years
    Well, the problem is that the standard algorithms can't be easily composed, because they are not lazy.
  • sehe
    sehe about 10 years
    @JanHudec Indeed. (sorry about that? :)). Which is why you use a library (much like you use AMP/TBB for concurrency, or Reactive Extensions in C#). Many people are working on a range proposition + implementation for inclusion into the standard.
  • Viktor Sehr
    Viktor Sehr about 10 years
    @sehe Do you have any links to any of does propsitions?
  • sehe
    sehe about 10 years
    @hvd I acknowledged your point made in the comments, and used it to make it very clear how it's not feasible to cater for all useful compositions of algorithms in the standard library. Instead, we should hope for more composable concepts in the standard library!
  • sehe
    sehe about 10 years
    @ViktorSehr This page lists N1871, N2068 and maybe N3350. Besides that there are Boost Range, Eric Niebler's range-v3 and some other significant efforts
  • Ali
    Ali about 10 years
    @sehe +1 Very impressive, I have learned something new today! Would you be so kind as to tell us who are not familiar with Boost.Range and Phoenix where we can find the documentation/examples that explains how to use boost::phoenix to make such nice predicates without lambdas? A quick google search returned nothing relevant. Thanks!
  • sehe
    sehe about 10 years
    @Ali Boost Phoenix is sort of like Boost Lambda on steroids. It originated as a subproject of Boost Spirit, but has been spun off as it's own library for years now. This heritage might explain the lack of compelling examples in the docs. You might want to look at Spirit's semantic actions (which are Phoenix actors) and there are some things that lambdas can't quite do the way Phoenix does them
  • Ali
    Ali about 10 years
    @sehe Thanks. It is very disappointing that we have such a great tool but not examples showing what we can do with it / how to use it... :(
  • Bartek Banachewicz
    Bartek Banachewicz about 9 years
    we don't need one now so no real advantage in adding it. Let me answer to that with an analogy. We can write in assembly, so there's no real advantage in writing in C. The fact that a low-level fallback of a for-loop is now more convenient than before doesn't change the fact it's a low-level construct that leaves much more space for errors, is lengthier and harder to understand than algorithm functions.
  • CashCow
    CashCow about 9 years
    If you think my code above has more room for errors than a transform_if with 2 lambdas, one for the predicate and one for the transform, then please explain it. Assembly, C and C++ are different languages and have different places. The only place the algorithm may be at an advantage over a loop is the ability to "map/reduce" so run concurrently over large collections. However this way the user can control whether to loop in sequence or map-reduce.
  • Bartek Banachewicz
    Bartek Banachewicz about 9 years
    In a proper functional approach functions for predicate and mutator are well defined blocks which make the construct properly structured. For loop body can have arbitrary things in it, and every loop you see has to be carefully analyzed to understand its behavior.
  • CashCow
    CashCow about 9 years
    Leave the proper functional approach for proper functional languages. This is C++.
  • Bartek Banachewicz
    Bartek Banachewicz about 9 years
    CashCow as I think that functional programming is perfectly sane, doable and appropriate in C++, I'm keeping my downvote. It's not by any means personal, though.
  • sehe
    sehe about 9 years
    @CashCow Agreed. However, since you asked: coliru.stacked-crooked.com/a/9020224d60d18169
  • R. Martinho Fernandes
    R. Martinho Fernandes about 9 years
    "Fancy fitting that into a transform_if algorithm?" That is a "transform_if algorithm", except it has everything hardcoded.
  • CashCow
    CashCow about 9 years
    It performs the equivalent of a transform_if. Just that algorithms are supposed to simplify your code or improve it in some way, not make it more complicated.
  • CashCow
    CashCow about 9 years
    @sehe yeah, the lengths to which some people will go. I think a big issue of transform_if, copy_if, find etc is it encourages users to use linear search too much. Sometimes it's useful: small collections, one-off data sweeps etc. For general use you should try to partition the data better. remove_if is good though because you want to actually remove the items.
  • sehe
    sehe about 9 years
    @CashCow It's really really funny that you worry about encouraging linear searches and countering that by encouraging loops. I'm not in favour of any of these, but that argument sinks pretty fast. Actually ranged for doesn't support anything but linear iteration. (One notable advantage of my range example was genericity, in case you missed this minor benefit. Again, I was just responding to the explicit challenge. Not saying you should write it in that particular way)
  • CashCow
    CashCow about 9 years
    I encourage loops when it makes the code easier to read / follow. A handwritten loop puts responsibility into the hands of the person who wrote it. An algorithm provided and you tend to trust it.. Yes, the standard says it's O(N). And its genericity means someone might use it with an associative container for a ranged key which can be done with lower_bound on the first and upper_bound on the second.
  • sehe
    sehe about 9 years
    I don't get it. You trust the programmer to do the right thing, but if they use algorithms they suddenly gets ---more stupid--- it wrong? Also, didn't you just imply that the "non-loop" approaches are more complicated? Doesn't it sort of make sense, then, that the programmers lacking understanding of what they do will resort to loops? Doesn't that make it slightly more likely that programmers who don't use algorithms might not know what they are doing? (This is certainly what I tend to see). Again, I write loops liberally, and ranged-for is a huge blessing, but I know what I see around me.
  • Martin James
    Martin James about 9 years
    Don't programs without loops terminate rather quickly?
  • Richard Hodges
    Richard Hodges over 8 years
    Not measured - until users complain that their experience is CPU-bound (i.e. never) I am more concerned with correctness than nanoseconds. However I can't see it being poor. Optionals are very cheap since there is no memory allocation and Ts constructor is only called if the optional is actually populated. I would expect the optimiser to eliminate almost all dead code since all code paths are visible at compile time.
  • sehe
    sehe over 8 years
    Yeah. I'd agree if it weren't exactly about a general purpose algorithm (actually, generic building block inside those). This is the place where I'm not usually enthused unless something is as simple as it gets. Further, I'd love for the optional handling to be a decorator on any output iterator (so at least we get composability of output iterators, while we're trying to plug the lack of composability of algorithms).
  • Richard Hodges
    Richard Hodges over 8 years
    There's logically no difference whether you handle the optional insert via a decorator on the iteratior or in the transform function. It's ultimately just a test of a flag. I think you'll find that the optimised code would be the same either way. The only thing standing in the way of full optimisation would be exception handling. Marking T as having noexcept constructors would cure this.
  • Richard Hodges
    Richard Hodges over 8 years
    what form would you like the call to transform() to take? I'm sure we could build a composable iterator suite.
  • sehe
    sehe over 8 years
    Me too :) I was commenting on your suggestion. I was not proposing something else (I had that long ago. Let's have ranges and composable algorithms instead :))
  • Richard Hodges
    Richard Hodges over 8 years
    :) agreed. so far, boost::range is probably the most succinct way I've seen.
  • sehe
    sehe over 8 years
    would you rate this implementation production ready? Would it work well with non-copyable elements? Or move-iterators?
  • sehe
    sehe over 6 years
    "Why not?" - Because code is for humans. To me the friction is actually worse than going back to writing function objects instead of lambdas. *static_cast< back_insert_iterator<vector<ha *>> &>(*this) = &arg; is both unreadable and needlessly concrete. See this c++17 take with more generic usages.
  • sehe
    sehe over 6 years
    Here's a version doesn't hardcode the base-iterator (so you can use it with std::insert_iterator<> or std::ostream_iterator<> e.g.) and also let's you supply a transformation (e.g. as a lambda). c++17, Starting to look useful/Same in c++11
  • sehe
    sehe over 6 years
    Note, at this point, there is little reason to keep the base-iterators, and you can simply: use any function, noting that Boost contains a better implementation: boost::function_output_iterator. Now all that's left is re-inventing for_each_if :)
  • sehe
    sehe over 6 years
    Actually, re-reading the original question, let's add a voice of reason - using just c++11 standard library.
  • Jonny Dee
    Jonny Dee about 6 years
    I disagree regarding the "it makes very little sense to include std::filter_and_transform" part. Other programming languages also provide this combination in their "standard library". It totally makes sense to iterate over a list of elements once, transforming them on the fly, while skipping those that cannot be transformed. Other approaches require more than one pass. Yes, you can use BOOST, but the question actually was "Why is there no transform_if in the C++ standard library?". And IMHO, he is right to question this. There should be such a function in the standard library.
  • sehe
    sehe about 6 years
    @JonnyDee Lots of things "totally make sense" to do. That's the point: there is no end to the list of equally useful compositions. The current, dated, STL design couldn't possibly cover them all, and they would end up getting arcane and confusing names (that was what the sample was intended for). As you can see, I was literally answering why transform_if wouldn't be in the standard library. About the normative question ("should it be in the standard library") I agree with your opinion. But that's it: an opinion.
  • sehe
    sehe about 6 years
    @JonnyDee You're welcome to post your own answer. I'm also curious about how other language do supply transform_if. In my experience they all use composable abstractions instead (of the filter+map variation).
  • Jonny Dee
    Jonny Dee about 6 years
    @sehe Regarding "they all use composable abstractions": that's not true. Rust, for instance, has exactly such a transform_if. It's called filter_map. However, I must admit it's there to simplify code but, on the other hand, one could apply the same argument in the C++ case.
  • sehe
    sehe about 6 years
    @JonnyDee That's an interesting example. It's a bit different in that it mandates the combination of predicate and projection (or you could look at it the other way and say the predicate is hardcoded), but I suppose you could shoe-horn it: filter_map(range, make_optional_transform(predicate, projection)) and get equivalent code generation. I do wonder why filter_map was added there. (On the other hand of the spectrum we have Python which, in its quest to have One Way To Do It dropped things like reduce(): stackoverflow.com/questions/181543/…)
  • sehe
    sehe about 6 years
    It appears their docs suggest that it's "because it's much nicer to hide the Option<T>". And although the standard implementation seemingly is not in terms of map and filter, the point still stands: Rust, too, opted to supply the composable primitives, instead of hardcoding a limited set of compositions. (filter_map seems to be the exception to confirm the rule then). // @JonnyDee
  • Jonny Dee
    Jonny Dee about 6 years
    @sehe "filter_map seems to be the exception to confirm the rule then": you're right, but this is exactly the function this question is about. The question was not: "Why aren't there all arbitrary combinations of composable primitives already implemented in the standard library?"
  • sehe
    sehe about 6 years
    @JonnyDee Fair point. Perhaps if optional<T> existed in 20[03,11,14] it would have been proposed. The whole functional programming renaissance came after the (groundbreaking) STL revolution. I for one welcome a new era, new standard library to rule another 2 decades.
  • AlexTheo
    AlexTheo almost 5 years
    Well that is not expressive + the code doesn't look linear. ranges (boost::range or range-v3) would be the best solution to that problem...
  • wrestang
    wrestang almost 3 years
    This is pretty handy! It also works with c++17 if you replace boost with std::optional