Parse string into argv/argc
Solution 1
If glib solution is overkill for your case you may consider coding one yourself.
Then you can:
- scan the string and count how many arguments there are (and you get your argc)
- allocate an array of char * (for your argv)
- rescan the string, assign the pointers in the allocated array and replace spaces with '\0' (if you can't modify the string containing the arguments, you should duplicate it).
- don't forget to free what you have allocated!
The diagram below should clarify (hopefully):
aa bbb ccc "dd d" ee <- original string
aa0bbb0ccc00dd d00ee0 <- transformed string
| | | | |
argv[0] __/ / / / /
argv[1] ____/ / / /
argv[2] _______/ / /
argv[3] ___________/ /
argv[4] ________________/
A possible API could be:
char **parseargs(char *arguments, int *argc);
void freeparsedargs(char **argv);
You will need additional considerations to implement freeparsedargs() safely.
If your string is very long and you don't want to scan twice you may consider alteranatives like allocating more elements for the argv arrays (and reallocating if needed).
EDIT: Proposed solution (desn't handle quoted argument).
#include <stdio.h>
static int setargs(char *args, char **argv)
{
int count = 0;
while (isspace(*args)) ++args;
while (*args) {
if (argv) argv[count] = args;
while (*args && !isspace(*args)) ++args;
if (argv && *args) *args++ = '\0';
while (isspace(*args)) ++args;
count++;
}
return count;
}
char **parsedargs(char *args, int *argc)
{
char **argv = NULL;
int argn = 0;
if (args && *args
&& (args = strdup(args))
&& (argn = setargs(args,NULL))
&& (argv = malloc((argn+1) * sizeof(char *)))) {
*argv++ = args;
argn = setargs(args,argv);
}
if (args && !argv) free(args);
*argc = argn;
return argv;
}
void freeparsedargs(char **argv)
{
if (argv) {
free(argv[-1]);
free(argv-1);
}
}
int main(int argc, char *argv[])
{
int i;
char **av;
int ac;
char *as = NULL;
if (argc > 1) as = argv[1];
av = parsedargs(as,&ac);
printf("== %d\n",ac);
for (i = 0; i < ac; i++)
printf("[%s]\n",av[i]);
freeparsedargs(av);
exit(0);
}
Solution 2
I'm surprised nobody has provided the simplest answer using standard POSIX functionality:
http://www.opengroup.org/onlinepubs/9699919799/functions/wordexp.html
Solution 3
Here's my contribution. Its nice and short, but things to be wary of are:
- The use of strtok modifies the original "commandLine" string, replacing the spaces with \0 end-of-string delimeters
- argv[] ends up pointing into "commandLine", so don't modify it until you're finished with argv[].
The code:
enum { kMaxArgs = 64 };
int argc = 0;
char *argv[kMaxArgs];
char *p2 = strtok(commandLine, " ");
while (p2 && argc < kMaxArgs-1)
{
argv[argc++] = p2;
p2 = strtok(0, " ");
}
argv[argc] = 0;
You can now use argc and argv, or pass them to other functions declared like "foo(int argc, char **argv)".
Solution 4
The always-wonderful glib has g_shell_parse_args()
which sounds like what you're after.
If you're not interested in even quoting, this might be overkill. All you need to do is tokenize, using whitespace as a token character. Writing a simple routine to do that shouldn't take long, really.
If you're not super-stingy on memory, doing it in one pass without reallocations should be easy; just assume a worst-case of every second character being a space, thus assuming a string of n
characters contains at most (n + 1) / 2
arguments, and (of course) at most n
bytes of argument text (excluding terminators).
Solution 5
Here's a solution for both Windows and Unix (tested on Linux, OSX and Windows). Tested with Valgrind and Dr. Memory.
It uses wordexp for POSIX systems, and CommandLineToArgvW for Windows.
Note that for the Windows solution, most of the code is converting between char **
and wchar_t **
with the beautiful Win32 API, since there is no CommandLineToArgvA
available (ANSI-version).
#ifdef _WIN32
#include <windows.h>
#else
#include <wordexp.h>
#endif
char **split_commandline(const char *cmdline, int *argc)
{
int i;
char **argv = NULL;
assert(argc);
if (!cmdline)
{
return NULL;
}
// Posix.
#ifndef _WIN32
{
wordexp_t p;
// Note! This expands shell variables.
if (wordexp(cmdline, &p, 0))
{
return NULL;
}
*argc = p.we_wordc;
if (!(argv = calloc(*argc, sizeof(char *))))
{
goto fail;
}
for (i = 0; i < p.we_wordc; i++)
{
if (!(argv[i] = strdup(p.we_wordv[i])))
{
goto fail;
}
}
wordfree(&p);
return argv;
fail:
wordfree(&p);
}
#else // WIN32
{
wchar_t **wargs = NULL;
size_t needed = 0;
wchar_t *cmdlinew = NULL;
size_t len = strlen(cmdline) + 1;
if (!(cmdlinew = calloc(len, sizeof(wchar_t))))
goto fail;
if (!MultiByteToWideChar(CP_ACP, 0, cmdline, -1, cmdlinew, len))
goto fail;
if (!(wargs = CommandLineToArgvW(cmdlinew, argc)))
goto fail;
if (!(argv = calloc(*argc, sizeof(char *))))
goto fail;
// Convert from wchar_t * to ANSI char *
for (i = 0; i < *argc; i++)
{
// Get the size needed for the target buffer.
// CP_ACP = Ansi Codepage.
needed = WideCharToMultiByte(CP_ACP, 0, wargs[i], -1,
NULL, 0, NULL, NULL);
if (!(argv[i] = malloc(needed)))
goto fail;
// Do the conversion.
needed = WideCharToMultiByte(CP_ACP, 0, wargs[i], -1,
argv[i], needed, NULL, NULL);
}
if (wargs) LocalFree(wargs);
if (cmdlinew) free(cmdlinew);
return argv;
fail:
if (wargs) LocalFree(wargs);
if (cmdlinew) free(cmdlinew);
}
#endif // WIN32
if (argv)
{
for (i = 0; i < *argc; i++)
{
if (argv[i])
{
free(argv[i]);
}
}
free(argv);
}
return NULL;
}
codebox
Updated on July 27, 2022Comments
-
codebox almost 2 years
Is there a way in C to parse a piece of text and obtain values for argv and argc, as if the text had been passed to an application on the command line?
This doesn't have to work on Windows, just Linux - I also don't care about quoting of arguments.
-
Remo.D over 14 yearsWith the small problem that is C++ and not C :)
-
bua over 14 yearsYou're right, I've post it because when I was looking at sources some time ago, I remember it was generic, OOD free code, it looked almost like C. But I think its worth to keep this here.
-
Michael Burr over 14 yearsRename the file to argcargv.c and it's C. Literally.
-
Remo.D over 14 yearsI like the brevity of your solution but I'm not a big fan of strtok() or strdupa(). I'm also not very clear on what the strdup("test") is for. The major drawback to me seems the fact that you have many strdup and, hence, you will have to do many free() when done. I posted an alternative version in my answer, just in case it may be useful for somebody.
-
Remo.D over 14 yearsbecause getopt does a different job. It takes an array of arguments and look for options into it. This question is about splitting a string of "arguments" into an array of char * which is something that getopt is not able to do
-
Admin about 12 yearsIf you transform input string like that you can't do string concatenation with quotes" like "this' or 'this. See my answer for a full featured solution.
-
Exectron over 11 yearsThat may do more than you want. E.g. it does shell word expansions including environment variable substitution, e.g. it substituting
$PATH
with the current path. -
R.. GitHub STOP HELPING ICE over 11 yearsI guess it depends on what you mean by parse into argc/argv; certainly that involves some of what the shell does (processing quoting), but variable expansion and other things are more questionable. BTW
wordexp
does have an option to disable command expansion. -
Exectron over 11 yearsIf you mean
WRDE_NOCMD
, that doesn't seem to prevent expansion of$PATH
, nor expanding*
to the names of files in the current directory. -
R.. GitHub STOP HELPING ICE over 11 yearsI didn't say it prevented variable expansion, just that one other thing you might want to turn off, command expansion, can be turned off.
-
Steve Valliere about 11 yearsMr Peitrek's library appears to be very weak when compared to Microsoft's actual rules for separating a command line into argc/argv (see msdn.microsoft.com/en-us/library/17w5ykft.aspx for their rules.) He doesn't appear to handle embedded quoted strings, multiple backslashes or even escaped quote characters. Not a problem if that's not needed, of course, but folks should be sure they get what they need!
-
jrr almost 11 yearsThanks, that saved some time. To anyone else using this: "char* p1" (though your compiler would have told you =] )
-
Telemachus over 10 years@Remo.D I know it's a long time ago, but I was working on this same general problem myself and about to use
strtok
. It seems designed for just such a case. So, I'm curious: Why are you "not a big fan ofstrtok()
"? -
Max Truxa over 9 years(nit-picking ahead) Note that there is one small thing missing to be compliant with the standard
argc
/argv
layout: The entry behind the last valid one inargv
is always set toNULL
("foo bar"
:argv[0]
->"foo"
,argv[1]
->"bar"
,argv[2]
->NULL
). -
Joakim over 9 yearsAlso, it's totally unnecessary since Microsoft doesn't just give you the specification how they parse the command line, they also provide an API for this: CommandLineToArgvW
-
Jesse Chisholm over 8 yearsTo be a bit closer to the standard
argv
, add an extra position at the end withNULL
. This is done in case a programmer ignoresargc
and justwhile(process(*++argv));
until they hit thatNULL
. There would, of course, need to be more to handle quoted arguments (and escaped quotes). -
Jesse Chisholm over 8 years@Telemachus -
strtok
1: modifies the buffer it parses, 2: remembers your buffer across calls, which makes it 3: not thread safe as it is not re-entrant. It is not the designed purpose ofstrtok
but the designed in side effects that are annoying. :) :) :) -
domsson over 6 yearsThis is exactly what I was looking for and seems to work very well. I needed it to pass a user-defined command to
posix_spawn
, not knowing whether there would be additional arguments. However, a short code example would make this answer so much better. Yeah, even now, more than seven years later. :-) -
Zibri over 5 yearsnice but it does not handle quotes nor double quotes
-
Zibri over 5 yearsNote: it destroys the source string (because it uses strtok) keep it in mind and if needed add a strdup
-
Lê Quang Duy about 4 yearsI ran your code, but it fails if I input
"1 2 3 \'3 4\"567\' \"bol\'obala\" 2x2=\"foo\""
-
liuyang1 about 4 yearsyour buf is
strdup
from args. It's memory leaked. -
Zibri almost 4 years@LêQuangDuy feel free to modify it to suit your needs and post here your better solution ;)
-
Greedo over 2 yearsDoes this account for escaped args like "some/long path/to file.txt"?
-
sstteevvee over 2 yearsNo, you'd have to look for and handle quotes yourself after. If your code is running on a "real" OS, then I'd recommend seeing what it offers. For example, the glib one as suggested in another solution, which should give you those sorts of features.