Split string with delimiters in C
Solution 1
You can use the strtok()
function to split a string (and specify the delimiter to use). Note that strtok()
will modify the string passed into it. If the original string is required elsewhere make a copy of it and pass the copy to strtok()
.
EDIT:
Example (note it does not handle consecutive delimiters, "JAN,,,FEB,MAR" for example):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
char** str_split(char* a_str, const char a_delim)
{
char** result = 0;
size_t count = 0;
char* tmp = a_str;
char* last_comma = 0;
char delim[2];
delim[0] = a_delim;
delim[1] = 0;
/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}
/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);
/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;
result = malloc(sizeof(char*) * count);
if (result)
{
size_t idx = 0;
char* token = strtok(a_str, delim);
while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, delim);
}
assert(idx == count - 1);
*(result + idx) = 0;
}
return result;
}
int main()
{
char months[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
char** tokens;
printf("months=[%s]\n\n", months);
tokens = str_split(months, ',');
if (tokens)
{
int i;
for (i = 0; *(tokens + i); i++)
{
printf("month=[%s]\n", *(tokens + i));
free(*(tokens + i));
}
printf("\n");
free(tokens);
}
return 0;
}
Output:
$ ./main.exe
months=[JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC]
month=[JAN]
month=[FEB]
month=[MAR]
month=[APR]
month=[MAY]
month=[JUN]
month=[JUL]
month=[AUG]
month=[SEP]
month=[OCT]
month=[NOV]
month=[DEC]
Solution 2
I think strsep
is still the best tool for this:
while ((token = strsep(&str, ","))) my_fn(token);
That is literally one line that splits a string.
The extra parentheses are a stylistic element to indicate that we're intentionally testing the result of an assignment, not an equality operator ==
.
For that pattern to work, token
and str
both have type char *
. If you started with a string literal, then you'd want to make a copy of it first:
// More general pattern:
const char *my_str_literal = "JAN,FEB,MAR";
char *token, *str, *tofree;
tofree = str = strdup(my_str_literal); // We own str's memory now.
while ((token = strsep(&str, ","))) my_fn(token);
free(tofree);
If two delimiters appear together in str
, you'll get a token
value that's the empty string. The value of str
is modified in that each delimiter encountered is overwritten with a zero byte - another good reason to copy the string being parsed first.
In a comment, someone suggested that strtok
is better than strsep
because strtok
is more portable. Ubuntu and Mac OS X have strsep
; it's safe to guess that other unixy systems do as well. Windows lacks strsep
, but it has strbrk
which enables this short and sweet strsep
replacement:
char *strsep(char **stringp, const char *delim) {
if (*stringp == NULL) { return NULL; }
char *token_start = *stringp;
*stringp = strpbrk(token_start, delim);
if (*stringp) {
**stringp = '\0';
(*stringp)++;
}
return token_start;
}
Here is a good explanation of strsep
vs strtok
. The pros and cons may be judged subjectively; however, I think it's a telling sign that strsep
was designed as a replacement for strtok
.
Solution 3
String tokenizer this code should put you in the right direction.
int main(void) {
char st[] ="Where there is will, there is a way.";
char *ch;
ch = strtok(st, " ");
while (ch != NULL) {
printf("%s\n", ch);
ch = strtok(NULL, " ,");
}
getch();
return 0;
}
Solution 4
Method below will do all the job (memory allocation, counting the length) for you. More information and description can be found here - Implementation of Java String.split() method to split C string
int split (const char *str, char c, char ***arr)
{
int count = 1;
int token_len = 1;
int i = 0;
char *p;
char *t;
p = str;
while (*p != '\0')
{
if (*p == c)
count++;
p++;
}
*arr = (char**) malloc(sizeof(char*) * count);
if (*arr == NULL)
exit(1);
p = str;
while (*p != '\0')
{
if (*p == c)
{
(*arr)[i] = (char*) malloc( sizeof(char) * token_len );
if ((*arr)[i] == NULL)
exit(1);
token_len = 0;
i++;
}
p++;
token_len++;
}
(*arr)[i] = (char*) malloc( sizeof(char) * token_len );
if ((*arr)[i] == NULL)
exit(1);
i = 0;
p = str;
t = ((*arr)[i]);
while (*p != '\0')
{
if (*p != c && *p != '\0')
{
*t = *p;
t++;
}
else
{
*t = '\0';
i++;
t = ((*arr)[i]);
}
p++;
}
return count;
}
How to use it:
int main (int argc, char ** argv)
{
int i;
char *s = "Hello, this is a test module for the string splitting.";
int c = 0;
char **arr = NULL;
c = split(s, ' ', &arr);
printf("found %d tokens.\n", c);
for (i = 0; i < c; i++)
printf("string #%d: %s\n", i, arr[i]);
return 0;
}
Solution 5
Here is my two cents:
int split (const char *txt, char delim, char ***tokens)
{
int *tklen, *t, count = 1;
char **arr, *p = (char *) txt;
while (*p != '\0') if (*p++ == delim) count += 1;
t = tklen = calloc (count, sizeof (int));
for (p = (char *) txt; *p != '\0'; p++) *p == delim ? *t++ : (*t)++;
*tokens = arr = malloc (count * sizeof (char *));
t = tklen;
p = *arr++ = calloc (*(t++) + 1, sizeof (char *));
while (*txt != '\0')
{
if (*txt == delim)
{
p = *arr++ = calloc (*(t++) + 1, sizeof (char *));
txt++;
}
else *p++ = *txt++;
}
free (tklen);
return count;
}
Usage:
char **tokens;
int count, i;
const char *str = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
count = split (str, ',', &tokens);
for (i = 0; i < count; i++) printf ("%s\n", tokens[i]);
/* freeing tokens */
for (i = 0; i < count; i++) free (tokens[i]);
free (tokens);
Related videos on Youtube
namco
Updated on February 27, 2022Comments
-
namco over 2 years
How do I write a function to split and return an array for a string with delimiters in the C programming language?
char* str = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"; str_split(str,',');
-
Daniel Kamil Kozar over 12 yearsYou can use the
strtok
function from the standard library to achieve the same thing. -
BLUEPIXY over 12 years
-
fnisi over 8 yearsA comment...the key point for a
strtok()
family function is understandingstatic variables
in C. i.e. how they behave between successive function call in which they are used. See my code below -
chqrlie over 2 years
strtok
is not a solution for this problem for multiple reasons: it modifies the source string, it has a hidden static state that makes it non reentrant, it will handle sequences of delimiters as a single delimiter, which seems incorrect for,
, and as a consequence will not split empty strings at the start middle nor end of,X,,Y,
. Don't usestrtok
.
-
-
SteveP over 10 yearsHi. I think the function has hard coded "," as the separator: char* token = strtok(a_str, ",");
-
hmjd over 10 years@SteveP, well spotted. That code has been there for months and nobody has noticed. Will fix it shortly.
-
Peter Mortensen over 10 yearsAs this may be the canonical question/answer on Stack Overflow for this, aren't there some caveats with respect to multi-threading using strtok?
-
Admin almost 10 years@osgx According to that page,
strsep
is a replacement forstrtok
, butstrtok
is preferred for portability. So, unless you need support for empty fields or splitting multiple strings at once,strtok
is a better choice. -
Dojo over 9 yearsPossibly a stupid question but how does strtok(0, delim); know the source string?
-
Jonathan Leffler about 9 years@Dojo: It remembers it; that's one of the reasons it is problematic. It would be better to use
strtok_s()
(Microsoft, C11 Annex K, optional) orstrtok_r()
(POSIX) than plainstrtok()
. Plainstrtok()
is evil in a library function. No function calling the library function may be usingstrtok()
at the time, and no function called by the library function may callstrtok()
. -
Martin about 9 yearsif (last_comma < (a_str + strlen(a_str) - 1) ) count++; seems to be more readable than /* Add space for trailing token. */ count += last_comma < (a_str + strlen(a_str) - 1);
-
metalcrash about 9 yearsThis method is wrong. I was just deleted this post, but then I realized it maybe interesting for some of you.
-
Aymon Fournier about 9 yearsHow do I call this from main? I don't know what to pass to buffer.
-
Ciro Santilli OurBigBook.com almost 9 years@osgx I don't see the obsolete note on
strsep
andstrtok
man pages. And asman strsep
says, it is not POSIX. -
Ciro Santilli OurBigBook.com almost 9 yearsMore precisely on portability: it is not POSIX 7, but BSD derived, and implemented on glibc.
-
Sean W almost 9 yearsJust a note that
strtok()
is not thread safe (for the reasons @JonathanLeffler mentioned) and therefore this whole function is not thread safe. If you try to use this in a treaded environment, you'll get erratic and unpredictable results. Replacingstrtok()
forstrtok_r()
fixes this issue. -
Hafiz Temuri about 8 yearsoh boi, three pointers! I am already scared of using it lol its just me, I am not very good with pointers in c.
-
Michi about 8 yearsHuh Three star Programmer :)) This sounds interesting.
-
chqrlie almost 8 yearsScanning for separators twice is probably more advisable than allocating a potentially large array of
token
. -
rdtsc over 7 yearsI was just about to ask... Pelle's C has strdup(), but no strsep().
-
Alex over 7 yearsAllocation logic is wrong. realloc() returns new pointer and you discard returned value. No proper way to return new memory pointer - function prototype should be changed to accept size of allocated
buffer
and leave allocation to caller, process max size elements. -
Lux about 7 years@Alex Fixed, completely rewritten, and tested. Note: not sure whether this'll work for non-ASCII or not.
-
Jan about 7 yearsJust as a reminder: calling malloc within a function without freeing it in the same scope, is considered not to be a good practice. You should supply the function with a buffer that is big enough to write the char** array into. Otherwise the caller has to take care about a freeing of an allocation, that he might not know about.
-
minus one almost 7 yearsBe aware that the strtok function changes the string 'str' was applied to!
-
minus one almost 7 yearsBe aware that the strtok function changes the string was applied to! This implementation can make serious trouble!
-
apaderno almost 7 years@osgx It is not marked as obsolete. The
strsep()
function is intended as a replacement for thestrtok()
function. While thestrtok()
function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990) it is unable to handle empty fields, i.e., detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time. Thestrsep()
function first appeared in 4.4BSD. -
Gianmarco Biscini over 6 yearsthe call to malloc needs a cast to char** for the code to work, so it is result = (char**)malloc(sizeof(char*) * count);
-
Sdlion about 6 yearswhy
tofree
is the one free'd and notstr
? -
Tyler about 6 yearsYou can't free
str
because its value can be changed by calls tostrsep()
. The value oftofree
consistently points to the start of the memory you want to free. -
Juanmi Taboada almost 6 yearsThis solution fails for empty strings, when counting elements last_comma should be updated to tmp if no characters are in the string.
-
KeizerHarm over 5 yearsWhen I do this, it either adds too much to the last token, or allocates it too much memory. This is the output:
found 10 tokens. string #0: Hello, string #1: this string #2: is string #3: a string #4: test string #5: module string #6: for string #7: the string #8: string string #9: splitting.¢
-
hmmftg over 5 yearsThanks man, all above strtok answers didnot worked in my case even after alot of efforts, and your code works like a charm!
-
Jorma Rebane over 5 yearsThis example is dangerous -- beginners who need help googling this will probably forget freeing the elements. This is not a C-like solution, it feels like written by a Java programmer who doesn't understand how to write safe C. This kind of teaching is detrimental to everyone. The accepted C approach is to use strsep or strtok_r. If you can't modify original string, use strdup before tokenizing.
-
Jorma Rebane over 5 yearsThis example has multiple memory leaks. For anyone reading this, do not use this approach. Prefer strtok or strsep tokenization approaches instead.
-
Kamiccolo almost 5 yearsFor starters, this is not C code. And why would You pass pointers by actual reference in C++?
-
Lux almost 5 years@Kamiccolo I'm sorry, how exactly is this not C code? Also, why is passing pointers by reference a problem here?
-
Kamiccolo almost 5 yearsC does not have references.
-
Subin almost 4 yearsworks good, need to add a
+1
to count in the loop though, to get the part after delimeter. -
VimNing over 3 yearsIf it's wrong then you should delete/update it.
-
chqrlie over 2 yearsI would upvote your answer if you please replace the call to
strncpy
with one tomemcpy
. Please do not advise newbies to use this poorly understood and error prone function. -
H.S. over 2 years@chqrlie Included your suggestion. Yes I do agree with you, better to use
memcpy
. Thanks.