Split a string using C++11
Solution 1
std::regex_token_iterator
performs generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}
Solution 2
Here is a (maybe less verbose) way to split string (based on the post you mentioned).
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> elems;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
// elems.push_back(std::move(item)); // if C++11 (based on comment from @mchiasson)
}
return elems;
}
Solution 3
Here's an example of splitting a string and populating a vector with the extracted elements using boost
.
#include <boost/algorithm/string.hpp>
std::string my_input("A,B,EE");
std::vector<std::string> results;
boost::algorithm::split(results, my_input, boost::is_any_of(","));
assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");
Solution 4
Another regex solution inspired by other answers but hopefully shorter and easier to read:
std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};
Solution 5
I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 features it uses is auto
and range-based for
loop.
#include <string>
#include <cctype>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string s = "hello how are you won't you tell me your name";
vector<string> tokens;
string token;
for (const auto& c: s) {
if (!isspace(c))
token += c;
else {
if (token.length()) tokens.push_back(token);
token.clear();
}
}
if (token.length()) tokens.push_back(token);
return 0;
}
Comments
-
Mark almost 2 years
What would be easiest method to split a string using c++11?
I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.
Edit: I would like to have a
vector<string>
as a result and be able to delimitate on a single character. -
Sebastian Mach about 12 yearswhy not
for (auto const c : s) {...}
? -
jackyalcine over 9 yearsShould mention that this is MSFT-specific. Doesn't exist on POSIX systems.
-
phs over 9 yearsLooks like it is also available in boost.
-
Brent Bradburn over 9 years
regex_token_iterator
is defined in C++11, but GCC doesn't support it natively until version 4.9 (see here). With earlier versions of GCC, you can use Boost regex. -
Brent Bradburn over 9 yearsA good regex initializer would be
" +"
, for "one or more spaces". -
Alfred Bratterud about 9 yearsA good'er regex would be
\\s+
for whitespace. Also, on gcc 4.9 I have to explicitly initialize a regex with the string parameter, before passing it to the iterator constructor. Just addregex re{regex_str};
as a first line, whereregex_str
is the string calledregex
in the example, then passre
. -
mchiasson about 9 yearsIf you are using C++11, you could also do this to avoid string copies when inserting into your vector: elems.push_back(std::move(item));
-
Drew Dormann almost 9 yearsIn C++11,
for (auto && s : tok) { v.push_back(s); }
. -
v010dya almost 8 yearsUnless somebody wants to use UTF8 or several characters.
-
phoad over 7 yearscplusplus.com/reference/cstring/strchr If it is permitted to use strchr, it might help your implementation.
-
lppier almost 7 yearsYou need gcc 4.9 for this.
-
Ela782 over 6 yearsWhy
int
as delimiter, and whyint delimiter(int)
the(int)
? -
Fsmv over 6 years@Ela782 it's a function pointer argument, a function that accepts an int parameter and returns int. The default is the isspace function.
-
eminemence over 5 yearsI think this post : stackoverflow.com/questions/11719538/… does it better using istringstream
-
code_fodder over 5 yearsThis works great, but even after reading the docs - I don't get the syntax of the line starting
std::sregex_token_iterator...
. Is this two iterators called first and last? and why islast
not "set" to any value - I assume there is some sort of default...? -
code_fodder over 5 yearsOh wait - I got the last bit now after re-reading... it defaults to
end of sequece
. So I am assuming this could be re-written:std::sregex_token_iterator first{input.begin(), input.end(), re, -1};
andstd::sregex_token_iterator last;
...? -
JohannesD over 5 years@code_fodder Yes, a(ny) default-constructed instance functions as an end iterator here. This is also the case with stream iterators and other cases where there's no definitive "end" known beforehand.
-
Tony_Tong over 5 yearsa great solution, but should be aware that the second regex param(as delimiter) is treated as regex express, which means, if you delimiter is something like "|", then, you can't just pass "|"(regex special char), you should use escape charater for it, it shoud be "\\|"
-
heLomaN over 5 yearsGood solution. However if you concern performance, use boost::regex or else.
-
Gardener about 5 yearsNice answer. Where is this syntax:
words{it, {}};
described for initializing a vector? -
Gardener about 5 yearsFound an answer here: empty curly braces as end of range
-
Carlos Pinzón almost 5 yearsAfter renaming the second
s
to sayw
it worked nicely. Please update the answer so that it compiles everywhere. -
Jack Zhang over 3 yearsFirstly,
std::move
doesn't move any thing, it just cast the type.item
is defined on the stack, it will be copied to vector's heap. So usingstd::move
here won't avoid copy. -
xperroni over 3 yearsIn case anyone else is wondering: the
-1
argument of thesregex_token_iterator
constructor causes the object to iterate over the fragments between matches. The default value of0
would iterate over fragments matching the regex. See here for more details. -
Niclas about 3 yearsSorry, way too complicated.
-
Tyg13 about 3 yearsEven though
item
is defined on the stack, the internal data pointer points to data allocated on the heap. By usingstd::move
, thepush_back(std::string&&)
overload will be selected, causing thestd::string
object inside the vector to be initialized by move -- simply copying the data pointer, rather than copying the entire buffer.