Split a string using C++11

143,239

Solution 1

std::regex_token_iterator performs generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:

std::vector<std::string> split(const string& input, const string& regex) {
    // passing -1 as the submatch index parameter performs splitting
    std::regex re(regex);
    std::sregex_token_iterator
        first{input.begin(), input.end(), re, -1},
        last;
    return {first, last};
}

Solution 2

Here is a (maybe less verbose) way to split string (based on the post you mentioned).

#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
  std::stringstream ss(s);
  std::string item;
  std::vector<std::string> elems;
  while (std::getline(ss, item, delim)) {
    elems.push_back(item);
    // elems.push_back(std::move(item)); // if C++11 (based on comment from @mchiasson)
  }
  return elems;
}

Solution 3

Here's an example of splitting a string and populating a vector with the extracted elements using boost.

#include <boost/algorithm/string.hpp>

std::string my_input("A,B,EE");
std::vector<std::string> results;

boost::algorithm::split(results, my_input, boost::is_any_of(","));

assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");

Solution 4

Another regex solution inspired by other answers but hopefully shorter and easier to read:

std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};

Solution 5

I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 features it uses is auto and range-based for loop.

#include <string>
#include <cctype>
#include <iostream>
#include <vector>

using namespace std;

int main()
{
  string s = "hello  how    are you won't you tell me your name";
  vector<string> tokens;
  string token;

  for (const auto& c: s) {
    if (!isspace(c))
      token += c;
    else {
      if (token.length()) tokens.push_back(token);
      token.clear();
    }
  }

  if (token.length()) tokens.push_back(token);
     
  return 0;
}
Share:
143,239
Mark
Author by

Mark

Student

Updated on July 05, 2022

Comments

  • Mark
    Mark almost 2 years

    What would be easiest method to split a string using c++11?

    I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.

    Edit: I would like to have a vector<string> as a result and be able to delimitate on a single character.

  • Sebastian Mach
    Sebastian Mach about 12 years
    why not for (auto const c : s) {...}?
  • jackyalcine
    jackyalcine over 9 years
    Should mention that this is MSFT-specific. Doesn't exist on POSIX systems.
  • phs
    phs over 9 years
    Looks like it is also available in boost.
  • Brent Bradburn
    Brent Bradburn over 9 years
    regex_token_iterator is defined in C++11, but GCC doesn't support it natively until version 4.9 (see here). With earlier versions of GCC, you can use Boost regex.
  • Brent Bradburn
    Brent Bradburn over 9 years
    A good regex initializer would be" +", for "one or more spaces".
  • Alfred Bratterud
    Alfred Bratterud about 9 years
    A good'er regex would be \\s+ for whitespace. Also, on gcc 4.9 I have to explicitly initialize a regex with the string parameter, before passing it to the iterator constructor. Just add regex re{regex_str}; as a first line, where regex_str is the string called regex in the example, then pass re.
  • mchiasson
    mchiasson about 9 years
    If you are using C++11, you could also do this to avoid string copies when inserting into your vector: elems.push_back(std::move(item));
  • Drew Dormann
    Drew Dormann almost 9 years
    In C++11, for (auto && s : tok) { v.push_back(s); }.
  • v010dya
    v010dya almost 8 years
    Unless somebody wants to use UTF8 or several characters.
  • phoad
    phoad over 7 years
    cplusplus.com/reference/cstring/strchr If it is permitted to use strchr, it might help your implementation.
  • lppier
    lppier almost 7 years
    You need gcc 4.9 for this.
  • Ela782
    Ela782 over 6 years
    Why int as delimiter, and why int delimiter(int) the (int)?
  • Fsmv
    Fsmv over 6 years
    @Ela782 it's a function pointer argument, a function that accepts an int parameter and returns int. The default is the isspace function.
  • eminemence
    eminemence over 5 years
    I think this post : stackoverflow.com/questions/11719538/… does it better using istringstream
  • code_fodder
    code_fodder over 5 years
    This works great, but even after reading the docs - I don't get the syntax of the line starting std::sregex_token_iterator.... Is this two iterators called first and last? and why is last not "set" to any value - I assume there is some sort of default...?
  • code_fodder
    code_fodder over 5 years
    Oh wait - I got the last bit now after re-reading... it defaults to end of sequece. So I am assuming this could be re-written: std::sregex_token_iterator first{input.begin(), input.end(), re, -1}; and std::sregex_token_iterator last;...?
  • JohannesD
    JohannesD over 5 years
    @code_fodder Yes, a(ny) default-constructed instance functions as an end iterator here. This is also the case with stream iterators and other cases where there's no definitive "end" known beforehand.
  • Tony_Tong
    Tony_Tong over 5 years
    a great solution, but should be aware that the second regex param(as delimiter) is treated as regex express, which means, if you delimiter is something like "|", then, you can't just pass "|"(regex special char), you should use escape charater for it, it shoud be "\\|"
  • heLomaN
    heLomaN over 5 years
    Good solution. However if you concern performance, use boost::regex or else.
  • Gardener
    Gardener about 5 years
    Nice answer. Where is this syntax: words{it, {}}; described for initializing a vector?
  • Gardener
    Gardener about 5 years
    Found an answer here: empty curly braces as end of range
  • Carlos Pinzón
    Carlos Pinzón almost 5 years
    After renaming the second s to say w it worked nicely. Please update the answer so that it compiles everywhere.
  • Jack Zhang
    Jack Zhang over 3 years
    Firstly, std::move doesn't move any thing, it just cast the type. item is defined on the stack, it will be copied to vector's heap. So using std::move here won't avoid copy.
  • xperroni
    xperroni over 3 years
    In case anyone else is wondering: the -1 argument of the sregex_token_iterator constructor causes the object to iterate over the fragments between matches. The default value of 0 would iterate over fragments matching the regex. See here for more details.
  • Niclas
    Niclas about 3 years
    Sorry, way too complicated.
  • Tyg13
    Tyg13 about 3 years
    Even though item is defined on the stack, the internal data pointer points to data allocated on the heap. By using std::move, the push_back(std::string&&) overload will be selected, causing the std::string object inside the vector to be initialized by move -- simply copying the data pointer, rather than copying the entire buffer.