Tokenizing strings in C

151,696

Solution 1

Do it like this:

char s[256];
strcpy(s, "one two three");
char* token = strtok(s, " ");
while (token) {
    printf("token: %s\n", token);
    token = strtok(NULL, " ");
}

Note: strtok modifies the string its tokenising, so it cannot be a const char*.

Solution 2

Here's an example of strtok usage, keep in mind that strtok is destructive of its input string (and therefore can't ever be used on a string constant

char *p = strtok(str, " ");
while(p != NULL) {
    printf("%s\n", p);
    p = strtok(NULL, " ");
}

Basically the thing to note is that passing a NULL as the first parameter to strtok tells it to get the next token from the string it was previously tokenizing.

Solution 3

strtok can be very dangerous. It is not thread safe. Its intended use is to be called over and over in a loop, passing in the output from the previous call. The strtok function has an internal variable that stores the state of the strtok call. This state is not unique to each thread - it is global. If any other code uses strtok in another thread, you get problems. Not the kind of problems you want to track down either!

I'd recommend looking for a regex implementation, or using sscanf to pull apart the string.

Try this:

char strprint[256];
char text[256];
strcpy(text, "My string to test");
while ( sscanf( text, "%s %s", strprint, text) > 0 ) {
   printf("token: %s\n", strprint);
}

Note: The 'text' string is destroyed as it's separated. This may not be the preferred behaviour =)

Solution 4

I've made some string functions in order to split values, by using less pointers as I could because this code is intended to run on PIC18F processors. Those processors does not handle really good with pointers when you have few free RAM available:

#include <stdio.h>
#include <string.h>

char POSTREQ[255] = "pwd=123456&apply=Apply&d1=88&d2=100&pwr=1&mpx=Internal&stmo=Stereo&proc=Processor&cmp=Compressor&ip1=192&ip2=168&ip3=10&ip4=131&gw1=192&gw2=168&gw3=10&gw4=192&pt=80&lic=&A=A";

int findchar(char *string, int Start, char C) {
    while((string[Start] != 0)) { Start++; if(string[Start] == C) return Start; }
    return -1;
}

int findcharn(char *string, int Times, char C) {
   int i = 0, pos = 0, fnd = 0;

    while(i < Times) {
       fnd = findchar(string, pos, C);
        if(fnd < 0) return -1;
        if(fnd > 0) pos = fnd;
       i++;
   }
   return fnd;
}

void mid(char *in, char *out, int start, int end) {
    int i = 0;
    int size = end - start;

    for(i = 0; i < size; i++){
        out[i] = in[start + i + 1];
    }
    out[size] = 0;
}

void getvalue(char *out, int index) {
    mid(POSTREQ, out, findcharn(POSTREQ, index, '='), (findcharn(POSTREQ, index, '&') - 1));
}

void main() {
   char n_pwd[7];
   char n_d1[7];

   getvalue(n_d1, 1);

   printf("Value: %s\n", n_d1);
} 

Solution 5

You can simplify the code by introducing an extra variable.

#include <string.h>
#include <stdio.h>

int main()
{
    char str[100], *s = str, *t = NULL;

    strcpy(str, "a space delimited string");
    while ((t = strtok(s, " ")) != NULL) {
        s = NULL;
        printf(":%s:\n", t);
    }
    return 0;
}
Share:
151,696

Related videos on Youtube

kombo
Author by

kombo

Updated on July 09, 2022

Comments

  • kombo
    kombo almost 2 years

    I have been trying to tokenize a string using SPACE as delimiter but it doesn't work. Does any one have suggestion on why it doesn't work?

    Edit: tokenizing using:

    strtok(string, " ");
    

    The code is like the following

    pch = strtok (str," ");
    while (pch != NULL)
    {
      printf ("%s\n",pch);
      pch = strtok (NULL, " ");
    }
    
    • Edward Kmett
      Edward Kmett over 15 years
      Are you using strtok or something you grew yourself? cplusplus.com/reference/clibrary/cstring/strtok.html If you are using strtok are you trying to do it on a constant string?
    • dmckee --- ex-moderator kitten
      dmckee --- ex-moderator kitten over 15 years
      OK. Now we're getting somewhere. What behavior do you expect that you are not getting?
    • dmckee --- ex-moderator kitten
      dmckee --- ex-moderator kitten over 15 years
      BTW, kombo. Many people who work help desks or teach see the phrase "it doesn't work" as marking a user who hasn't read the furnished manual, or doesn't know what they actually want, or is deeply confused. The form you want is "I'm doing X, and I expected Y, but I got Z. What's wrong?"
    • Jonathan Leffler
      Jonathan Leffler over 15 years
      @dmckee: good point. Canonical x-ref: catb.org/~esr/faqs/smart-questions.html
  • Will Dean
    Will Dean over 15 years
    In fact, if you look at modern strtok implementations, they tend to use thread-local storage (MSVC has certainly done this for years and years), so they are thread-safe. It's still an archaic function which I would avoid, though...
  • Jason
    Jason over 12 years
    strtok has an internal state variable tracking the string being tokenized. When you pass NULL to it, strtok will continue to use this state variable. When you pass a non-null value, the state variable is reset. So in other words: passing NULL means "continue tokenizing the same string".
  • Jason
    Jason over 12 years
    you're right, that's why many implementations offer strtok_r which atr the very least offers a way to use it in a thread safe way.
  • Massimo Fazzolari
    Massimo Fazzolari almost 12 years
    strtok_r is a thread-safe version of strtok pubs.opengroup.org/onlinepubs/009695399/functions/strtok.htm‌​l
  • Nilo Paim
    Nilo Paim over 10 years
    Nice, @jitsceait, but what happens if I have two delimiters together on input? I'll change a little your code.
  • jitsceait
    jitsceait over 10 years
    I think i have added a test case for consecutive delimiters and it was working. Could you please highlight the code you have changed?
  • Jason
    Jason over 9 years
    @Gnuey, p will point to characters in the string being tokenized. Additionally, strtok replaces the delimiter found with a '\0' character so that p will effectively be a valid NUL terminated string. So if you were to run it on char[] s = "hello world"; The first call would return a pointer to the h character and the buffer would then contain "hello\0world".
  • S.S. Anne
    S.S. Anne over 4 years
    I agree with the first paragraph but the sentence after that is terrible. scanf is hard to use properly as shown in your example; you forget to pass a size (%255s).
  • rje
    rje almost 3 years
    strtok() is fine for non-threaded legacy systems, though. Archaic code for retro systems.