Using strtok with a std::string

138,888

Solution 1

#include <iostream>
#include <string>
#include <sstream>
int main(){
    std::string myText("some-text-to-tokenize");
    std::istringstream iss(myText);
    std::string token;
    while (std::getline(iss, token, '-'))
    {
        std::cout << token << std::endl;
    }
    return 0;
}

Or, as mentioned, use boost for more flexibility.

Solution 2

  1. If boost is available on your system (I think it's standard on most Linux distros these days), it has a Tokenizer class you can use.

  2. If not, then a quick Google turns up a hand-rolled tokenizer for std::string that you can probably just copy and paste. It's very short.

  3. And, if you don't like either of those, then here's a split() function I wrote to make my life easier. It'll break a string into pieces using any of the chars in "delim" as separators. Pieces are appended to the "parts" vector:

    void split(const string& str, const string& delim, vector<string>& parts) {
      size_t start, end = 0;
      while (end < str.size()) {
        start = end;
        while (start < str.size() && (delim.find(str[start]) != string::npos)) {
          start++;  // skip initial whitespace
        }
        end = start;
        while (end < str.size() && (delim.find(str[end]) == string::npos)) {
          end++; // skip to end of word
        }
        if (end-start != 0) {  // just ignore zero-length strings.
          parts.push_back(string(str, start, end-start));
        }
      }
    }
    

Solution 3

Duplicate the string, tokenize it, then free it.

char *dup = strdup(str.c_str());
token = strtok(dup, " ");
free(dup);

Solution 4

There is a more elegant solution.

With std::string you can use resize() to allocate a suitably large buffer, and &s[0] to get a pointer to the internal buffer.

At this point many fine folks will jump and yell at the screen. But this is the fact. About 2 years ago

the library working group decided (meeting at Lillehammer) that just like for std::vector, std::string should also formally, not just in practice, have a guaranteed contiguous buffer.

The other concern is does strtok() increases the size of the string. The MSDN documentation says:

Each call to strtok modifies strToken by inserting a null character after the token returned by that call.

But this is not correct. Actually the function replaces the first occurrence of a separator character with \0. No change in the size of the string. If we have this string:

one-two---three--four

we will end up with

one\0two\0--three\0-four

So my solution is very simple:


std::string str("some-text-to-split");
char seps[] = "-";
char *token;

token = strtok( &str[0], seps );
while( token != NULL )
{
   /* Do your thing */
   token = strtok( NULL, seps );
}

Read the discussion on http://www.archivum.info/comp.lang.c++/2008-05/02889/does_std::string_have_something_like_CString::GetBuffer

Solution 5

With C++17 str::string receives data() overload that returns a pointer to modifieable buffer so string can be used in strtok directly without any hacks:

#include <string>
#include <iostream>
#include <cstring>
#include <cstdlib>

int main()
{
    ::std::string text{"pop dop rop"};
    char const * const psz_delimiter{" "};
    char * psz_token{::std::strtok(text.data(), psz_delimiter)};
    while(nullptr != psz_token)
    {
        ::std::cout << psz_token << ::std::endl;
        psz_token = std::strtok(nullptr, psz_delimiter);
    }
    return EXIT_SUCCESS;
}

output

pop
dop
rop

Share:
138,888
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I have a string that I would like to tokenize. But the C strtok() function requires my string to be a char*. How can I do this simply?

    I tried:

    token = strtok(str.c_str(), " "); 
    

    which fails because it turns it into a const char*, not a char*

  • SinisterMJ
    SinisterMJ over 15 years
    Isn't the better question, why use strtok when the language in question has better native options?
  • PhiLho
    PhiLho over 15 years
    If the asker is a newbie, you should want against doing free() before using token... :-)
  • SinisterMJ
    SinisterMJ over 15 years
    I am dubious that using a more robust native tokenizer is ever less safe than inserting new code that calls a library that inserts nulls into the block of memory passed to it... that's why I did not think it a good idea to answer the question as asked.
  • Martin York
    Martin York over 15 years
    casting away the const does not help. It is const for a reason.
  • philant
    philant over 15 years
    @Martin York, @Sherm Pendley : did you read the conclusion or only the code snippet ? I edited my answer to clarify what I wanted to show here. Rgds.
  • Sherm Pendley
    Sherm Pendley over 15 years
    @Philippe - Yes, I only read the code. A lot of people will do that, and go straight to the code and skip the explanation. Perhaps putting the explanation in the code, as a comment, would be a good idea? Anyhow, I removed my down vote.
  • Colin D Bennett
    Colin D Bennett almost 9 years
    Note that strtok() is not thread-safe or re-entrant. In an program with multiple tasks, it should be avoided.
  • SnakE
    SnakE over 8 years
    -1. strtok() works on a null-terminated string while std::string's buffer is not required to be null-terminated. There is no way around c_str().
  • Leushenko
    Leushenko over 7 years
    @SnakE std::string's buffer is required to be null-terminated. data and c_str are required to be identical and data() + i == &operator[](i) for every i in [0, size()].
  • SnakE
    SnakE over 7 years
    @Leushenko you're partially right. Null-termination is only guaranteed since C++11. I've added a note to the answer. I'll lift my -1 as soon as my edit is accepted.
  • orbitcowboy
    orbitcowboy over 7 years
    FYI: strtok will change the value of s. You should not use const_cast, since this simply hides an issue.
  • orbitcowboy
    orbitcowboy over 7 years
    Does anybody know a compiler (Warning-switch) or a static code analyzer that warns about issues like this?
  • FanaticD
    FanaticD about 7 years
    Also, while we are at it, we should note that strdup() comes from POSIX which is why it may be preferable not to use it.
  • dmitri
    dmitri almost 7 years
    This hack is not worth it. This "elegant" solution wrecks std::string object in a few ways. std::cout << str << " " << str.size(); std::cout << str.c_str()<< " " << strlen(str.c_str()); Before: some-text-to-split 18 some-text-to-split 18 After: sometexttosplit 18 some 4.
  • Chandra Shekhar
    Chandra Shekhar about 6 years
    what is the use of "token = strtok( NULL, seps )" in the code above.Please answer coz tried to search this use but cudnot get much.
  • thegreatcoder
    thegreatcoder over 5 years
    strtok() supports multiple delimiters while getline does not. Is there a simple way to circumvent that?
  • Chris Blackwell
    Chris Blackwell over 5 years
    @thegreatcoder I believe you could use regex_token_iterator to tokenize with multiple delimiters. And thanks for the blast from the past, I answered the original question a loooooong time ago :)
  • M.M
    M.M over 5 years
    This causes undefined behaviour by using the result of c_str() to modify the string
  • maximus
    maximus about 5 years
    @M.M added more clarification and working of the strtok function. Hope it will help people understard when to use it
  • definelicht
    definelicht over 4 years
    The hand-rolled link is broken
  • user233009
    user233009 about 4 years
    note: the original std::string will not hold the same value anymore, as strtok replaces the delimiter it found with a null terminator in place, instead of returning you a copy of the string. if you want to keep the original string, create a copy of the string and pass that into strtok.
  • user7860670
    user7860670 about 4 years
    @user233009 note: if strtok handles only a single delimiter then the original value of the string may be preserved by putting back delimiter replacing null terminator on each iteration.
  • Vivian De Smedt
    Vivian De Smedt over 3 years
    This will not work. strtok will modifying the internal of str. I suppose it is a side effect the user doesn't want. The solution is to create a char buffer and copy first the str sting into the buffer.