How to implode a vector of strings into a string (the elegant way)

111,749

Solution 1

Use boost::algorithm::join(..):

#include <boost/algorithm/string/join.hpp>
...
std::string joinedString = boost::algorithm::join(elems, delim);

See also this question.

Solution 2

std::vector<std::string> strings;

const char* const delim = ", ";

std::ostringstream imploded;
std::copy(strings.begin(), strings.end(),
           std::ostream_iterator<std::string>(imploded, delim));

(include <string>, <vector>, <sstream> and <iterator>)

If you want to have a clean end (no trailing delimiter) have a look here

Solution 3

You should use std::ostringstream rather than std::string to build the output (then you can call its str() method at the end to get a string, so your interface need not change, only the temporary s).

From there, you could change to using std::ostream_iterator, like so:

copy(elems.begin(), elems.end(), ostream_iterator<string>(s, delim)); 

But this has two problems:

  1. delim now needs to be a const char*, rather than a single char. No big deal.
  2. std::ostream_iterator writes the delimiter after every single element, including the last. So you'd either need to erase the last one at the end, or write your own version of the iterator which doesn't have this annoyance. It'd be worth doing the latter if you have a lot of code that needs things like this; otherwise the whole mess might be best avoided (i.e. use ostringstream but not ostream_iterator).

Solution 4

Because I love one-liners (they are very useful for all kinds of weird stuff, as you'll see at the end), here's a solution using std::accumulate and C++11 lambda:

std::accumulate(alist.begin(), alist.end(), std::string(), 
    [](const std::string& a, const std::string& b) -> std::string { 
        return a + (a.length() > 0 ? "," : "") + b; 
    } )

I find this syntax useful with stream operator, where I don't want to have all kinds of weird logic out of scope from the stream operation, just to do a simple string join. Consider for example this return statement from method that formats a string using stream operators (using std;):

return (dynamic_cast<ostringstream&>(ostringstream()
    << "List content: " << endl
    << std::accumulate(alist.begin(), alist.end(), std::string(), 
        [](const std::string& a, const std::string& b) -> std::string { 
            return a + (a.length() > 0 ? "," : "") + b; 
        } ) << endl
    << "Maybe some more stuff" << endl
    )).str();

Update:

As pointed out by @plexando in the comments, the above code suffers from misbehavior when the array starts with empty strings due to the fact that the check for "first run" is missing previous runs that have resulted in no additional characters, and also - it is weird to run a check for "is first run" on all runs (i.e. the code is under-optimized).

The solution for both of these problems is easy if we know for a fact that the list has at least one element. OTOH, if we know for a fact that the list does not have at least one element, then we can shorten the run even more.

I think the resulting code isn't as pretty, so I'm adding it here as The Correct Solution, but I think the discussion above still has merrit:

alist.empty() ? "" : /* leave early if there are no items in the list */
  std::accumulate( /* otherwise, accumulate */
    ++alist.begin(), alist.end(), /* the range 2nd to after-last */
    *alist.begin(), /* and start accumulating with the first item */
    [](auto& a, auto& b) { return a + "," + b; });

Notes:

  • For containers that support direct access to the first element, its probably better to use that for the third argument instead, so alist[0] for vectors.
  • As per the discussion in the comments and chat, the lambda still does some copying. This can be minimized by using this (less pretty) lambda instead: [](auto&& a, auto&& b) -> auto& { a += ','; a += b; return a; }) which (on GCC 10) improves performance by more than x10. Thanks to @Deduplicator for the suggestion. I'm still trying to figure out what is going on here.

Solution 5

I like to use this one-liner accumulate (no trailing delimiter):

std::accumulate(
    std::next(elems.begin()), 
    elems.end(), 
    elems[0], 
    [](std::string a, std::string b) {
        return a + delimiter + b;
    }
);
Share:
111,749
ezpresso
Author by

ezpresso

Updated on December 27, 2021

Comments

  • ezpresso
    ezpresso over 2 years

    I'm looking for the most elegant way to implode a vector of strings into a string. Below is the solution I'm using now:

    static std::string& implode(const std::vector<std::string>& elems, char delim, std::string& s)
    {
        for (std::vector<std::string>::const_iterator ii = elems.begin(); ii != elems.end(); ++ii)
        {
            s += (*ii);
            if ( ii + 1 != elems.end() ) {
                s += delim;
            }
        }
    
        return s;
    }
    
    static std::string implode(const std::vector<std::string>& elems, char delim)
    {
        std::string s;
        return implode(elems, delim, s);
    }
    

    Is there any others out there?

  • Michael Krelin - hacker
    Michael Krelin - hacker about 13 years
    keep in mind, though, that it will add extra delimiter (the second parameter to the std::ostream_iterator constructor at the end of the stream.
  • Jerry Coffin
    Jerry Coffin about 13 years
    Or use one that's already written: stackoverflow.com/questions/3496982/…
  • Oktalist
    Oktalist over 10 years
    Don't use accumulate for strings. Most of the other answers are O(n) but accumulate is O(n^2) because it makes a temporary copy of the accumulator before appending each element. And no, move semantics don't help.
  • Guss
    Guss over 10 years
    @Oktalist, I'm not sure why you say that - cplusplus.com/reference/numeric/accumulate says "Complexity is linear in the distance between first and last".
  • Oktalist
    Oktalist over 10 years
    That's assuming that each individual addition takes constant time. If T has an overloaded operator+ (like string does) or if you provide your own functor then all bets are off. Although I may have been hasty in saying move semantics don't help, they don't solve the problem in the two implementations that I've checked. See my answers to similar questions.
  • Oktalist
    Oktalist over 10 years
    skwllsp's comment is nothing to do with it. Like I said, most of the other answers (and the OP's implode example) are doing the right thing. They are O(n), even if they don't call reserve on the string. Only the solution using accumulate is O(n^2). No need for C-style code.
  • Guss
    Guss over 10 years
    I've reviewed all the answers as well as the OPs code sample, and all of them except for the one doing reserve (with a hard-coded value which we hope is large enough) are re-copying all the so far accumulated data on every iteration (std::string normally allocates more memory than it actually needs for appending, so appending delim will likely not incur another copy). The main difference between those implementations and std::accumulate based is the additional "return a copy" from the lambda, for which move semantics will prevent a large copy operation.
  • Oktalist
    Oktalist over 10 years
    On every iteration? So you're saying they're all O(n^2)? Not on my machine. They don't copy on every iteration for the same reason that they don't copy when appending delim, string append is usually amortized O(1) because it allocates much more than it needs. Here's a quick benchmark: codepad.org/mfubiiMg
  • kirbyfan64sos
    kirbyfan64sos about 9 years
    I did a benchmark, and accumulate was actually faster than an O(n) string stream.
  • Jonny
    Jonny over 8 years
    The point of "implode" is that a delimiter should not be added last. This answer unfortunately adds that delimiter last.
  • Julian
    Julian over 6 years
    Suggesting to include and link against the massive boost library to create a simple string is absurd.
  • River Tam
    River Tam over 6 years
    @Julian most projects already do this. I agree that it is absurd that the STL does not include a way to do this already, however. I might also agree that this should not be the top answer, but other answers are clearly available.
  • Azeroth2b
    Azeroth2b over 5 years
    I concur with @Julian. Boost may be elegant in use but is no way the "most elegant way" in terms of overhead. In this case, it's a workaround to the OP's algorithm and not a resolution to the question itself.
  • jbruni
    jbruni about 5 years
    Most Boost libraries are header-only, so there is nothing to link. Some even make their way into the standard.
  • Carlos Pinzón
    Carlos Pinzón about 4 years
    Be careful when it's empty.
  • plexando
    plexando over 3 years
    A questionable algorithmic design. You know upfront that a comma needs to be prepended to every element except the first one, then why do you do all those useless runtime checks? In addition to that, your code skips all empty pieces until the first non-empty one, then thereafter, empty pieces are not skipped anymore but appended.
  • Guss
    Guss over 3 years
    @plexando - you are absolutely correct. I wasn't sure about the first part of your comment, but as I checked out the skipping empty strings issue I figured out a solution that solves both problems. The result, while more resilient, is less pretty so I'll just add it to the answer as an update. Thank you for pointing at the problem! 👍🙇‍♂️
  • Deduplicator
    Deduplicator over 3 years
    The last one is bad: It doesn't take advantage of std::move() in std::accumulate(), resulting in at least one allocation per element added, more likely two. The lambda should be changed to [](auto&& a, auto&& b){ a += ","; a += b; return decltype(a)(a); }. That is, if you don't want to change it all to std::accumulate(alist.begin(), alist.end(), std::string(), [bool f = false](auto&& a, auto&& b) mutable { if (f) a += ","; f = true; a += b; return decltype(a)(a); }) and let the compiler lower the flag to code.
  • Guss
    Guss over 3 years
    @Deduplicator Why do you think return a + "," + b; doesn't use move semantics (while my previous code did) and will changing it to return (a += ",") += b; help?
  • Deduplicator
    Deduplicator over 3 years
    It doesn't matter whether return a + "," + b; uses move-semantics for the return itself, as it must perforce create a new object divorced from both a and b from scratch, as neither are xvalues there. Actually, the temporary is subject to RVO. Using op+= doesn't help, as that isn't overloaded for xvalue this.
  • Guss
    Guss over 3 years
    @Deduplicator, after reading en.cppreference.com/w/cpp/algorithm/accumulate, I'm still not sure I understand how you can tell when the lambda return value can be moved or not, but I'm pretty sure their examples mean that a is an xvalue. I did ran some benchmarks (w/ gcc 10) and while your suggestions of using forwarding references, mutable, decltype or the captured bool have no effect, using two += statements instead of three add terms does improve performance by ~20% (!!). Interestingly, so does just writing return a + ("," + b); - so I guess we know what the problem is.
  • Deduplicator
    Deduplicator over 3 years
    Have you tried making the arguments forward-references and using the first as decltype(a)(a)?
  • Guss
    Guss over 3 years
  • Константин Ван
    Константин Ван over 3 years
    And fortunately I need to add the token last as well! Thanks for the solution.
  • Kiruahxh
    Kiruahxh about 3 years
    Not having this basic feature in stdlib is absurd.
  • Jason C
    Jason C about 3 years
    You'd never need to check every time though. Just add the first element outside the loop, and start the loop at the second...
  • Jason C
    Jason C about 3 years
    @Kiruahxh Proposed in 2013, but appears to have stalled for some reason.
  • xtofl
    xtofl about 3 years
    I hope the compiler is smart enough to remove the check for the ret to be empty on every iteration.
  • xtofl
    xtofl about 3 years
    I don't see why you'd add that. There isn't a loop in this function, and accumulate does receive the first element and is told to start at the second...
  • Jason C
    Jason C about 3 years
    What I mean is: "Especially with bigger collections, you want to avoid having to check if youre still adding the first element or not to ensure no trailing separator." -- You can avoid having to check this in the loop method that your statement refers to by pulling the first element out of the loop. Sorry, I was kinda vague; I was commenting on the premise, not on the solution. The solution you provided is perfectly fine.
  • xtofl
    xtofl about 3 years
    I share your idea. Related: stackoverflow.com/questions/156650/….
  • c z
    c z over 2 years
    @xtofl You're overestimating the cost of that check.
  • xtofl
    xtofl over 2 years
    I'm just following the "Don't pay for what you don't use" rule. You're right in most cases, where lst is not enormous.
  • Bruce Nielsen
    Bruce Nielsen about 2 years
    Besides the fact that ret.empty() is a trivial check, this is a use case that is perfect for branch predictors, since after the first test it will always evaluate to false.
  • Moberg
    Moberg almost 2 years
    @kirbyfan64sos was the gist removed?
  • kirbyfan64sos
    kirbyfan64sos almost 2 years
    @Moberg I changed my github username, new URL is gist.github.com/refi64/e69b605a287cedcd57f5