How does the strtok function in C work?

29,829

Solution 1

Two things to know about strtok. As was mentioned, it "maintains internal state". Also, it messes up the string you feed it. Essentially, it will write a '\0' where it finds the token you supplied, and returns a pointer to the start of the string. Internally it maintains the location of the last token; and next time you call it, it starts from there.

The important corollary is that you cannot use strtok on a const char* "hello world"; type of string, since you will get an access violation when you modify contents of a const char* string.

The "good" thing about strtok is that it doesn't actually copy strings - so you don't need to manage additional memory allocation etc. But unless you understand the above, you will have trouble using it correctly.

Example - if you have "this,is,a,string", successive calls to strtok will generate pointers as follows (the ^ is the value returned). Note that the '\0' is added where the tokens are found; this means the source string is modified:

t  h  i  s  ,  i  s  ,  a  ,  s  t  r  i  n  g \0         this,is,a,string

t  h  i  s  \0 i  s  ,  a  ,  s  t  r  i  n  g \0         this
^
t  h  i  s  \0 i  s  \0 a  ,  s  t  r  i  n  g \0         is
               ^
t  h  i  s  \0 i  s  \0 a  \0 s  t  r  i  n  g \0         a
                        ^
t  h  i  s  \0 i  s  \0 a  \0 s  t  r  i  n  g \0         string
                              ^

Hope it makes sense.

Solution 2

strtok maintains internal state. When you call it with non-NULL it re-initializes itself to use the string you supply. When you call it with NULL it uses that string, and any other state its currently got to return the next token.

Because of the way strtok works you need to ensure that you link with a multithreaded version of the C runtime if you're writing a multithreaded application. This will ensure that each thread get its own internal state for strtok.

Solution 3

The strtok() function stores data between calls. It uses that data when you call it with a NULL pointer.

From http://www.cplusplus.com/reference/cstring/strtok/ :

The point where the last token was found is kept internally by the function to be used on the next call (particular library implementations are not required to avoid data races).

Solution 4

The strtok function stores data in an internal static variable which is shared among all threads.

For thread safety you should use strtok_r

From http://www.opensource.apple.com/source/Libc/Libc-167/string.subproj/strtok.c

Take a look to static char *last;

char *
strtok(s, delim)
    register char *s;
    register const char *delim;
{
    register char *spanp;
    register int c, sc;
    char *tok;
    static char *last;


    if (s == NULL && (s = last) == NULL)
        return (NULL);

    /*
     * Skip (span) leading delimiters (s += strspn(s, delim), sort of).
     */
cont:
    c = *s++;
    for (spanp = (char *)delim; (sc = *spanp++) != 0;) {
        if (c == sc)
            goto cont;
    }

    if (c == 0) {       /* no non-delimiter characters */
        last = NULL;
        return (NULL);
    }
    tok = s - 1;

    /*
     * Scan token (scan for delimiters: s += strcspn(s, delim), sort of).
     * Note that delim must have one NUL; we stop if we see that, too.
     */
    for (;;) {
        c = *s++;
        spanp = (char *)delim;
        do {
            if ((sc = *spanp++) == c) {
                if (c == 0)
                    s = NULL;
                else
                    s[-1] = 0;
                last = s;
                return (tok);
            }
        } while (sc != 0);
    }
    /* NOTREACHED */
}
Share:
29,829

Related videos on Youtube

user2426316
Author by

user2426316

Updated on July 09, 2022

Comments

  • user2426316
    user2426316 almost 2 years

    I found this sample program which explains the strtok function:

    #include <stdio.h>
    #include <string.h>
    
    int main ()
    {
        char str[] ="- This, a sample string.";
        char * pch;
        printf ("Splitting string \"%s\" into tokens:\n",str);
        pch = strtok (str," ,.-");
        while (pch != NULL)
        {
            printf ("%s\n",pch);
            pch = strtok (NULL, " ,.-");
        }
        return 0;
    }
    

    However, I don't see how this is possible to work.

    How is it possible that pch = strtok (NULL, " ,.-"); returns a new token. I mean, we are calling strtokwith NULL . This doesen't make a lot sense to me.

    • Sander De Dycker
      Sander De Dycker over 10 years
    • alk
      alk over 10 years
      "I found this sample program which explains the strtok function it's not the example that explains, but the documentation, so you might like to read here: man7.org/linux/man-pages/man3/strtok.3.html
    • vrdhn
      vrdhn over 10 years
      And it doesn't make sense to anyone .. so strtok_r() was created...
    • alk
      alk over 10 years
      OT: It's int main (void) by the way.
  • David Heffernan
    David Heffernan over 10 years
    Most modern runtimes store the state in thread local storage. Which means that it is thread safe but not safe when used re-entrantly.
  • Andy Thomas
    Andy Thomas over 10 years
    Thanks for the correction.
  • David Heffernan
    David Heffernan over 10 years
    strtok_s on Windows