Split string with delimiters in C

751,815

Solution 1

You can use the strtok() function to split a string (and specify the delimiter to use). Note that strtok() will modify the string passed into it. If the original string is required elsewhere make a copy of it and pass the copy to strtok().

EDIT:

Example (note it does not handle consecutive delimiters, "JAN,,,FEB,MAR" for example):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

char** str_split(char* a_str, const char a_delim)
{
    char** result    = 0;
    size_t count     = 0;
    char* tmp        = a_str;
    char* last_comma = 0;
    char delim[2];
    delim[0] = a_delim;
    delim[1] = 0;

    /* Count how many elements will be extracted. */
    while (*tmp)
    {
        if (a_delim == *tmp)
        {
            count++;
            last_comma = tmp;
        }
        tmp++;
    }

    /* Add space for trailing token. */
    count += last_comma < (a_str + strlen(a_str) - 1);

    /* Add space for terminating null string so caller
       knows where the list of returned strings ends. */
    count++;

    result = malloc(sizeof(char*) * count);

    if (result)
    {
        size_t idx  = 0;
        char* token = strtok(a_str, delim);

        while (token)
        {
            assert(idx < count);
            *(result + idx++) = strdup(token);
            token = strtok(0, delim);
        }
        assert(idx == count - 1);
        *(result + idx) = 0;
    }

    return result;
}

int main()
{
    char months[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
    char** tokens;

    printf("months=[%s]\n\n", months);

    tokens = str_split(months, ',');

    if (tokens)
    {
        int i;
        for (i = 0; *(tokens + i); i++)
        {
            printf("month=[%s]\n", *(tokens + i));
            free(*(tokens + i));
        }
        printf("\n");
        free(tokens);
    }

    return 0;
}

Output:

$ ./main.exe
months=[JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC]

month=[JAN]
month=[FEB]
month=[MAR]
month=[APR]
month=[MAY]
month=[JUN]
month=[JUL]
month=[AUG]
month=[SEP]
month=[OCT]
month=[NOV]
month=[DEC]

Solution 2

I think strsep is still the best tool for this:

while ((token = strsep(&str, ","))) my_fn(token);

That is literally one line that splits a string.

The extra parentheses are a stylistic element to indicate that we're intentionally testing the result of an assignment, not an equality operator ==.

For that pattern to work, token and str both have type char *. If you started with a string literal, then you'd want to make a copy of it first:

// More general pattern:
const char *my_str_literal = "JAN,FEB,MAR";
char *token, *str, *tofree;

tofree = str = strdup(my_str_literal);  // We own str's memory now.
while ((token = strsep(&str, ","))) my_fn(token);
free(tofree);

If two delimiters appear together in str, you'll get a token value that's the empty string. The value of str is modified in that each delimiter encountered is overwritten with a zero byte - another good reason to copy the string being parsed first.

In a comment, someone suggested that strtok is better than strsep because strtok is more portable. Ubuntu and Mac OS X have strsep; it's safe to guess that other unixy systems do as well. Windows lacks strsep, but it has strbrk which enables this short and sweet strsep replacement:

char *strsep(char **stringp, const char *delim) {
  if (*stringp == NULL) { return NULL; }
  char *token_start = *stringp;
  *stringp = strpbrk(token_start, delim);
  if (*stringp) {
    **stringp = '\0';
    (*stringp)++;
  }
  return token_start;
}

Here is a good explanation of strsep vs strtok. The pros and cons may be judged subjectively; however, I think it's a telling sign that strsep was designed as a replacement for strtok.

Solution 3

String tokenizer this code should put you in the right direction.

int main(void) {
  char st[] ="Where there is will, there is a way.";
  char *ch;
  ch = strtok(st, " ");
  while (ch != NULL) {
  printf("%s\n", ch);
  ch = strtok(NULL, " ,");
  }
  getch();
  return 0;
}

Solution 4

Method below will do all the job (memory allocation, counting the length) for you. More information and description can be found here - Implementation of Java String.split() method to split C string

int split (const char *str, char c, char ***arr)
{
    int count = 1;
    int token_len = 1;
    int i = 0;
    char *p;
    char *t;

    p = str;
    while (*p != '\0')
    {
        if (*p == c)
            count++;
        p++;
    }

    *arr = (char**) malloc(sizeof(char*) * count);
    if (*arr == NULL)
        exit(1);

    p = str;
    while (*p != '\0')
    {
        if (*p == c)
        {
            (*arr)[i] = (char*) malloc( sizeof(char) * token_len );
            if ((*arr)[i] == NULL)
                exit(1);

            token_len = 0;
            i++;
        }
        p++;
        token_len++;
    }
    (*arr)[i] = (char*) malloc( sizeof(char) * token_len );
    if ((*arr)[i] == NULL)
        exit(1);

    i = 0;
    p = str;
    t = ((*arr)[i]);
    while (*p != '\0')
    {
        if (*p != c && *p != '\0')
        {
            *t = *p;
            t++;
        }
        else
        {
            *t = '\0';
            i++;
            t = ((*arr)[i]);
        }
        p++;
    }

    return count;
}

How to use it:

int main (int argc, char ** argv)
{
    int i;
    char *s = "Hello, this is a test module for the string splitting.";
    int c = 0;
    char **arr = NULL;

    c = split(s, ' ', &arr);

    printf("found %d tokens.\n", c);

    for (i = 0; i < c; i++)
        printf("string #%d: %s\n", i, arr[i]);

    return 0;
}

Solution 5

Here is my two cents:

int split (const char *txt, char delim, char ***tokens)
{
    int *tklen, *t, count = 1;
    char **arr, *p = (char *) txt;

    while (*p != '\0') if (*p++ == delim) count += 1;
    t = tklen = calloc (count, sizeof (int));
    for (p = (char *) txt; *p != '\0'; p++) *p == delim ? *t++ : (*t)++;
    *tokens = arr = malloc (count * sizeof (char *));
    t = tklen;
    p = *arr++ = calloc (*(t++) + 1, sizeof (char *));
    while (*txt != '\0')
    {
        if (*txt == delim)
        {
            p = *arr++ = calloc (*(t++) + 1, sizeof (char *));
            txt++;
        }
        else *p++ = *txt++;
    }
    free (tklen);
    return count;
}

Usage:

char **tokens;
int count, i;
const char *str = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";

count = split (str, ',', &tokens);
for (i = 0; i < count; i++) printf ("%s\n", tokens[i]);

/* freeing tokens */
for (i = 0; i < count; i++) free (tokens[i]);
free (tokens);
Share:
751,815

Related videos on Youtube

namco
Author by

namco

Updated on February 27, 2022

Comments

  • namco
    namco over 2 years

    How do I write a function to split and return an array for a string with delimiters in the C programming language?

    char* str = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
    str_split(str,',');
    
    • Daniel Kamil Kozar
      Daniel Kamil Kozar over 12 years
      You can use the strtok function from the standard library to achieve the same thing.
    • BLUEPIXY
      BLUEPIXY over 12 years
    • fnisi
      fnisi over 8 years
      A comment...the key point for a strtok() family function is understanding static variables in C. i.e. how they behave between successive function call in which they are used. See my code below
    • chqrlie
      chqrlie over 2 years
      strtok is not a solution for this problem for multiple reasons: it modifies the source string, it has a hidden static state that makes it non reentrant, it will handle sequences of delimiters as a single delimiter, which seems incorrect for ,, and as a consequence will not split empty strings at the start middle nor end of ,X,,Y,. Don't use strtok.
  • SteveP
    SteveP over 10 years
    Hi. I think the function has hard coded "," as the separator: char* token = strtok(a_str, ",");
  • hmjd
    hmjd over 10 years
    @SteveP, well spotted. That code has been there for months and nobody has noticed. Will fix it shortly.
  • Peter Mortensen
    Peter Mortensen over 10 years
    As this may be the canonical question/answer on Stack Overflow for this, aren't there some caveats with respect to multi-threading using strtok?
  • Admin
    Admin almost 10 years
    @osgx According to that page, strsep is a replacement for strtok, but strtok is preferred for portability. So, unless you need support for empty fields or splitting multiple strings at once, strtok is a better choice.
  • Dojo
    Dojo over 9 years
    Possibly a stupid question but how does strtok(0, delim); know the source string?
  • Jonathan Leffler
    Jonathan Leffler about 9 years
    @Dojo: It remembers it; that's one of the reasons it is problematic. It would be better to use strtok_s() (Microsoft, C11 Annex K, optional) or strtok_r() (POSIX) than plain strtok(). Plain strtok() is evil in a library function. No function calling the library function may be using strtok() at the time, and no function called by the library function may call strtok().
  • Martin
    Martin about 9 years
    if (last_comma < (a_str + strlen(a_str) - 1) ) count++; seems to be more readable than /* Add space for trailing token. */ count += last_comma < (a_str + strlen(a_str) - 1);
  • metalcrash
    metalcrash about 9 years
    This method is wrong. I was just deleted this post, but then I realized it maybe interesting for some of you.
  • Aymon Fournier
    Aymon Fournier about 9 years
    How do I call this from main? I don't know what to pass to buffer.
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com almost 9 years
    @osgx I don't see the obsolete note on strsep and strtok man pages. And as man strsep says, it is not POSIX.
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com almost 9 years
    More precisely on portability: it is not POSIX 7, but BSD derived, and implemented on glibc.
  • Sean W
    Sean W almost 9 years
    Just a note that strtok() is not thread safe (for the reasons @JonathanLeffler mentioned) and therefore this whole function is not thread safe. If you try to use this in a treaded environment, you'll get erratic and unpredictable results. Replacing strtok() for strtok_r() fixes this issue.
  • Hafiz Temuri
    Hafiz Temuri about 8 years
    oh boi, three pointers! I am already scared of using it lol its just me, I am not very good with pointers in c.
  • Michi
    Michi about 8 years
    Huh Three star Programmer :)) This sounds interesting.
  • chqrlie
    chqrlie almost 8 years
    Scanning for separators twice is probably more advisable than allocating a potentially large array of token.
  • rdtsc
    rdtsc over 7 years
    I was just about to ask... Pelle's C has strdup(), but no strsep().
  • Alex
    Alex over 7 years
    Allocation logic is wrong. realloc() returns new pointer and you discard returned value. No proper way to return new memory pointer - function prototype should be changed to accept size of allocated buffer and leave allocation to caller, process max size elements.
  • Lux
    Lux about 7 years
    @Alex Fixed, completely rewritten, and tested. Note: not sure whether this'll work for non-ASCII or not.
  • Jan
    Jan about 7 years
    Just as a reminder: calling malloc within a function without freeing it in the same scope, is considered not to be a good practice. You should supply the function with a buffer that is big enough to write the char** array into. Otherwise the caller has to take care about a freeing of an allocation, that he might not know about.
  • minus one
    minus one almost 7 years
    Be aware that the strtok function changes the string 'str' was applied to!
  • minus one
    minus one almost 7 years
    Be aware that the strtok function changes the string was applied to! This implementation can make serious trouble!
  • apaderno
    apaderno almost 7 years
    @osgx It is not marked as obsolete. The strsep() function is intended as a replacement for the strtok() function. While the strtok() function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990) it is unable to handle empty fields, i.e., detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time. The strsep() function first appeared in 4.4BSD.
  • Gianmarco Biscini
    Gianmarco Biscini over 6 years
    the call to malloc needs a cast to char** for the code to work, so it is result = (char**)malloc(sizeof(char*) * count);
  • Sdlion
    Sdlion about 6 years
    why tofree is the one free'd and not str?
  • Tyler
    Tyler about 6 years
    You can't free str because its value can be changed by calls to strsep(). The value of tofree consistently points to the start of the memory you want to free.
  • Juanmi Taboada
    Juanmi Taboada almost 6 years
    This solution fails for empty strings, when counting elements last_comma should be updated to tmp if no characters are in the string.
  • KeizerHarm
    KeizerHarm over 5 years
    When I do this, it either adds too much to the last token, or allocates it too much memory. This is the output: found 10 tokens. string #0: Hello, string #1: this string #2: is string #3: a string #4: test string #5: module string #6: for string #7: the string #8: string string #9: splitting.¢
  • hmmftg
    hmmftg over 5 years
    Thanks man, all above strtok answers didnot worked in my case even after alot of efforts, and your code works like a charm!
  • Jorma Rebane
    Jorma Rebane over 5 years
    This example is dangerous -- beginners who need help googling this will probably forget freeing the elements. This is not a C-like solution, it feels like written by a Java programmer who doesn't understand how to write safe C. This kind of teaching is detrimental to everyone. The accepted C approach is to use strsep or strtok_r. If you can't modify original string, use strdup before tokenizing.
  • Jorma Rebane
    Jorma Rebane over 5 years
    This example has multiple memory leaks. For anyone reading this, do not use this approach. Prefer strtok or strsep tokenization approaches instead.
  • Kamiccolo
    Kamiccolo almost 5 years
    For starters, this is not C code. And why would You pass pointers by actual reference in C++?
  • Lux
    Lux almost 5 years
    @Kamiccolo I'm sorry, how exactly is this not C code? Also, why is passing pointers by reference a problem here?
  • Kamiccolo
    Kamiccolo almost 5 years
    C does not have references.
  • Subin
    Subin almost 4 years
    works good, need to add a +1 to count in the loop though, to get the part after delimeter.
  • VimNing
    VimNing over 3 years
    If it's wrong then you should delete/update it.
  • chqrlie
    chqrlie over 2 years
    I would upvote your answer if you please replace the call to strncpy with one to memcpy. Please do not advise newbies to use this poorly understood and error prone function.
  • H.S.
    H.S. over 2 years
    @chqrlie Included your suggestion. Yes I do agree with you, better to use memcpy. Thanks.