How does strtok() split the string into tokens in C?

384,229

Solution 1

strtok() divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."

Solution 2

the strtok runtime function works like this

the first time you call strtok you provide a string that you want to tokenize

char s[] = "this is a string";

in the above string space seems to be a good delimiter between words so lets use that:

char* p = strtok(s, " ");

what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)

in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static pointer to your previous passed string:

p = strtok(NULL," ");

p now points to 'is'

and so on until no more spaces can be found, then the last string is returned as the last token 'string'.

more conveniently you could write it like this instead to print out all tokens:

for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
  puts(p);
}

EDIT:

If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.

Solution 3

strtok maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.

This is the reason strtok isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.

Solution 4

strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)

From the POSIX strtok page:

This function uses static storage to keep track of the current string position between calls.

There is a thread-safe variant (strtok_r) that doesn't do this type of magic.

Solution 5

strtok will tokenize a string i.e. convert it into a series of substrings.

It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.

The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.

One way to invoke strtok, succintly, is as follows:

char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;

for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
    printf("token=%s\n", token);
}

Result:

this
is
the
string
I
want
to
parse
Share:
384,229
Admin
Author by

Admin

Updated on May 01, 2021

Comments

  • Admin
    Admin about 3 years

    Please explain to me the working of strtok() function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actually does.

    I added watches on str and *pch to check its working when the first while loop occurred, the contents of str were only "this". How did the output shown below printed on the screen?

    /* strtok example */
    #include <stdio.h>
    #include <string.h>
    
    int main ()
    {
      char str[] ="- This, a sample string.";
      char * pch;
      printf ("Splitting string \"%s\" into tokens:\n",str);
      pch = strtok (str," ,.-");
      while (pch != NULL)
      {
        printf ("%s\n",pch);
        pch = strtok (NULL, " ,.-");
      }
      return 0;
    }
    

    Output:

    Splitting string "- This, a sample string." into tokens:
    This
    a
    sample
    string
    
  • Admin
    Admin over 13 years
    the ending condition for one token becomes the starting token of the next token?also is there a nul character placed in the place of the ending condition?
  • Sachin Shanbhag
    Sachin Shanbhag over 13 years
    @fahad- Yes, all the delimeters you have will be replaced by NUL character as other people have also suggested.
  • Admin
    Admin over 13 years
    So it does not actually place a nul character between the string?Why does my watch show that the string is left only with "THIS"?
  • Admin
    Admin over 13 years
    If all the delimiters are replaced by Nul,than why does the string contain"-this"? It should contain "\0"
  • Sachin Shanbhag
    Sachin Shanbhag over 13 years
    @fahad - It only replaces the delimiter characters with NUL, not all the characters between delimiters. Its kind of splitting the string into multiple tokens. You get "This" because its between two specified delimiters and not the "-this".
  • Admin
    Admin over 13 years
    so replacing the second delimiter,a nul is placed?
  • Sachin Shanbhag
    Sachin Shanbhag over 13 years
    @Fahad - Yes, absolutely. All spaces, "," and "-" are replaced by NUL because you have specified these as delimiters, as far as I understand.
  • Admin
    Admin over 13 years
    it does indeed replace the ' ' it found with '\0'. And, it does not restore ' ' later, so your string is ruined for good.
  • Admin
    Admin over 13 years
    I observed str[0] and str[1].str[1] should be '\0' as you said because str[0] is '-',but it was a space there.
  • Mat
    Mat about 12 years
    Well, the C library functions date from way-back-when, threading wasn't in the picture at all (that only started existing in 2011 as far as the C standard is concerned), so re-entrancy wasn't really important (I guess). That static local make the function "easy to use" (for some definition of "easy"). Like ctime returning a static string - practical (no-one needs to wonder who should free it), but not re-entrant and trips you up if you're not very aware of it.
  • IEatBagels
    IEatBagels almost 10 years
    +1 for static buffer, this is what I didn't understand
  • ylun.ca
    ylun.ca over 8 years
    What do you mean by the old internal reference 'getting clobbered'. Do you mean 'overwritten'?
  • John Bode
    John Bode over 8 years
    @ylun.ca: yes, that's what I mean.
  • MarredCheese
    MarredCheese almost 6 years
    This is wrong: "strtok doesn't change the parameter itself (str)." puts(str); prints "- This" since strtok modified str.
  • Mat
    Mat almost 6 years
    @MarredCheese: read again. It does not modify the pointer. It modifies the data the pointer points to (i.e. the string data)
  • MarredCheese
    MarredCheese almost 6 years
    Oh ok, I didn't realize that's what you getting at. Agreed.
  • tr_abhishek
    tr_abhishek over 4 years
    i think explaining via an example is much better than referring to some doc.
  • Groo
    Groo almost 4 years
    A very important detail, missing from the line "the first token is returned and p points to that token", is that strtok needs to mutate the original string by placing a null characters in place of a delimiter (otherwise other string functions wouldn't know where the token ends). And it also keeps track of the state using a static variable.
  • AndersK
    AndersK almost 4 years
    @Groo I think I already added that in the Edit that I did in 2017, but you are right.
  • Floris
    Floris over 3 years
    @AndersK you still never explicitly mention that the delimiter is replaced by \0 which is necessary. You just say the string is modified.