Using strtok with a std::string
Solution 1
#include <iostream>
#include <string>
#include <sstream>
int main(){
std::string myText("some-text-to-tokenize");
std::istringstream iss(myText);
std::string token;
while (std::getline(iss, token, '-'))
{
std::cout << token << std::endl;
}
return 0;
}
Or, as mentioned, use boost for more flexibility.
Solution 2
If boost is available on your system (I think it's standard on most Linux distros these days), it has a Tokenizer class you can use.
If not, then a quick Google turns up a hand-rolled tokenizer for std::string that you can probably just copy and paste. It's very short.
-
And, if you don't like either of those, then here's a split() function I wrote to make my life easier. It'll break a string into pieces using any of the chars in "delim" as separators. Pieces are appended to the "parts" vector:
void split(const string& str, const string& delim, vector<string>& parts) { size_t start, end = 0; while (end < str.size()) { start = end; while (start < str.size() && (delim.find(str[start]) != string::npos)) { start++; // skip initial whitespace } end = start; while (end < str.size() && (delim.find(str[end]) == string::npos)) { end++; // skip to end of word } if (end-start != 0) { // just ignore zero-length strings. parts.push_back(string(str, start, end-start)); } } }
Solution 3
Duplicate the string, tokenize it, then free it.
char *dup = strdup(str.c_str());
token = strtok(dup, " ");
free(dup);
Solution 4
There is a more elegant solution.
With std::string you can use resize() to allocate a suitably large buffer, and &s[0] to get a pointer to the internal buffer.
At this point many fine folks will jump and yell at the screen. But this is the fact. About 2 years ago
the library working group decided (meeting at Lillehammer) that just like for std::vector, std::string should also formally, not just in practice, have a guaranteed contiguous buffer.
The other concern is does strtok() increases the size of the string. The MSDN documentation says:
Each call to strtok modifies strToken by inserting a null character after the token returned by that call.
But this is not correct. Actually the function replaces the first occurrence of a separator character with \0. No change in the size of the string. If we have this string:
one-two---three--four
we will end up with
one\0two\0--three\0-four
So my solution is very simple:
std::string str("some-text-to-split");
char seps[] = "-";
char *token;
token = strtok( &str[0], seps );
while( token != NULL )
{
/* Do your thing */
token = strtok( NULL, seps );
}
Read the discussion on http://www.archivum.info/comp.lang.c++/2008-05/02889/does_std::string_have_something_like_CString::GetBuffer
Solution 5
With C++17 str::string
receives data()
overload that returns a pointer to modifieable buffer so string can be used in strtok
directly without any hacks:
#include <string>
#include <iostream>
#include <cstring>
#include <cstdlib>
int main()
{
::std::string text{"pop dop rop"};
char const * const psz_delimiter{" "};
char * psz_token{::std::strtok(text.data(), psz_delimiter)};
while(nullptr != psz_token)
{
::std::cout << psz_token << ::std::endl;
psz_token = std::strtok(nullptr, psz_delimiter);
}
return EXIT_SUCCESS;
}
output
pop
dop
rop
![Admin](/assets/logo_square_200-5d0d61d6853298bd2a4fe063103715b4daf2819fc21225efa21dfb93e61952ea.png)
Admin
Updated on July 09, 2022Comments
-
Admin almost 2 years
I have a string that I would like to tokenize. But the C
strtok()
function requires my string to be achar*
. How can I do this simply?I tried:
token = strtok(str.c_str(), " ");
which fails because it turns it into a
const char*
, not achar*
-
SinisterMJ over 15 yearsIsn't the better question, why use strtok when the language in question has better native options?
-
PhiLho over 15 yearsIf the asker is a newbie, you should want against doing free() before using token... :-)
-
SinisterMJ over 15 yearsI am dubious that using a more robust native tokenizer is ever less safe than inserting new code that calls a library that inserts nulls into the block of memory passed to it... that's why I did not think it a good idea to answer the question as asked.
-
Martin York over 15 yearscasting away the const does not help. It is const for a reason.
-
philant over 15 years@Martin York, @Sherm Pendley : did you read the conclusion or only the code snippet ? I edited my answer to clarify what I wanted to show here. Rgds.
-
Sherm Pendley over 15 years@Philippe - Yes, I only read the code. A lot of people will do that, and go straight to the code and skip the explanation. Perhaps putting the explanation in the code, as a comment, would be a good idea? Anyhow, I removed my down vote.
-
Colin D Bennett almost 9 yearsNote that
strtok()
is not thread-safe or re-entrant. In an program with multiple tasks, it should be avoided. -
SnakE over 8 years-1.
strtok()
works on a null-terminated string whilestd::string
's buffer is not required to be null-terminated. There is no way aroundc_str()
. -
Leushenko over 7 years@SnakE
std::string
's buffer is required to be null-terminated.data
andc_str
are required to be identical anddata() + i == &operator[](i)
for everyi
in[0, size()]
. -
SnakE over 7 years@Leushenko you're partially right. Null-termination is only guaranteed since C++11. I've added a note to the answer. I'll lift my -1 as soon as my edit is accepted.
-
orbitcowboy over 7 yearsFYI: strtok will change the value of s. You should not use const_cast, since this simply hides an issue.
-
orbitcowboy over 7 yearsDoes anybody know a compiler (Warning-switch) or a static code analyzer that warns about issues like this?
-
FanaticD about 7 yearsAlso, while we are at it, we should note that
strdup()
comes from POSIX which is why it may be preferable not to use it. -
dmitri almost 7 yearsThis hack is not worth it. This "elegant" solution wrecks std::string object in a few ways.
std::cout << str << " " << str.size(); std::cout << str.c_str()<< " " << strlen(str.c_str());
Before:some-text-to-split 18 some-text-to-split 18
After:sometexttosplit 18 some 4
. -
Chandra Shekhar about 6 yearswhat is the use of "token = strtok( NULL, seps )" in the code above.Please answer coz tried to search this use but cudnot get much.
-
thegreatcoder over 5 yearsstrtok() supports multiple delimiters while getline does not. Is there a simple way to circumvent that?
-
Chris Blackwell over 5 years@thegreatcoder I believe you could use regex_token_iterator to tokenize with multiple delimiters. And thanks for the blast from the past, I answered the original question a loooooong time ago :)
-
M.M over 5 yearsThis causes undefined behaviour by using the result of
c_str()
to modify the string -
maximus about 5 years@M.M added more clarification and working of the strtok function. Hope it will help people understard when to use it
-
definelicht over 4 yearsThe hand-rolled link is broken
-
user233009 about 4 yearsnote: the original
std::string
will not hold the same value anymore, as strtok replaces the delimiter it found with a null terminator in place, instead of returning you a copy of the string. if you want to keep the original string, create a copy of the string and pass that into strtok. -
user7860670 about 4 years@user233009 note: if
strtok
handles only a single delimiter then the original value of the string may be preserved by putting back delimiter replacing null terminator on each iteration. -
Vivian De Smedt over 3 yearsThis will not work. strtok will modifying the internal of str. I suppose it is a side effect the user doesn't want. The solution is to create a char buffer and copy first the str sting into the buffer.