Parse string into argv/argc

58,900

Solution 1

If glib solution is overkill for your case you may consider coding one yourself.

Then you can:

  • scan the string and count how many arguments there are (and you get your argc)
  • allocate an array of char * (for your argv)
  • rescan the string, assign the pointers in the allocated array and replace spaces with '\0' (if you can't modify the string containing the arguments, you should duplicate it).
  • don't forget to free what you have allocated!

The diagram below should clarify (hopefully):

             aa bbb ccc "dd d" ee         <- original string

             aa0bbb0ccc00dd d00ee0        <- transformed string
             |  |   |    |     |
   argv[0] __/  /   /    /     /
   argv[1] ____/   /    /     /
   argv[2] _______/    /     /
   argv[3] ___________/     /
   argv[4] ________________/ 

A possible API could be:

    char **parseargs(char *arguments, int *argc);
    void   freeparsedargs(char **argv);

You will need additional considerations to implement freeparsedargs() safely.

If your string is very long and you don't want to scan twice you may consider alteranatives like allocating more elements for the argv arrays (and reallocating if needed).

EDIT: Proposed solution (desn't handle quoted argument).

    #include <stdio.h>

    static int setargs(char *args, char **argv)
    {
       int count = 0;

       while (isspace(*args)) ++args;
       while (*args) {
         if (argv) argv[count] = args;
         while (*args && !isspace(*args)) ++args;
         if (argv && *args) *args++ = '\0';
         while (isspace(*args)) ++args;
         count++;
       }
       return count;
    }

    char **parsedargs(char *args, int *argc)
    {
       char **argv = NULL;
       int    argn = 0;

       if (args && *args
        && (args = strdup(args))
        && (argn = setargs(args,NULL))
        && (argv = malloc((argn+1) * sizeof(char *)))) {
          *argv++ = args;
          argn = setargs(args,argv);
       }

       if (args && !argv) free(args);

       *argc = argn;
       return argv;
    }

    void freeparsedargs(char **argv)
    {
      if (argv) {
        free(argv[-1]);
        free(argv-1);
      } 
    }

    int main(int argc, char *argv[])
    {
      int i;
      char **av;
      int ac;
      char *as = NULL;

      if (argc > 1) as = argv[1];

      av = parsedargs(as,&ac);
      printf("== %d\n",ac);
      for (i = 0; i < ac; i++)
        printf("[%s]\n",av[i]);

      freeparsedargs(av);
      exit(0);
    }

Solution 2

I'm surprised nobody has provided the simplest answer using standard POSIX functionality:

http://www.opengroup.org/onlinepubs/9699919799/functions/wordexp.html

Solution 3

Here's my contribution. Its nice and short, but things to be wary of are:

  • The use of strtok modifies the original "commandLine" string, replacing the spaces with \0 end-of-string delimeters
  • argv[] ends up pointing into "commandLine", so don't modify it until you're finished with argv[].

The code:

enum { kMaxArgs = 64 };
int argc = 0;
char *argv[kMaxArgs];

char *p2 = strtok(commandLine, " ");
while (p2 && argc < kMaxArgs-1)
  {
    argv[argc++] = p2;
    p2 = strtok(0, " ");
  }
argv[argc] = 0;

You can now use argc and argv, or pass them to other functions declared like "foo(int argc, char **argv)".

Solution 4

The always-wonderful glib has g_shell_parse_args() which sounds like what you're after.

If you're not interested in even quoting, this might be overkill. All you need to do is tokenize, using whitespace as a token character. Writing a simple routine to do that shouldn't take long, really.

If you're not super-stingy on memory, doing it in one pass without reallocations should be easy; just assume a worst-case of every second character being a space, thus assuming a string of n characters contains at most (n + 1) / 2 arguments, and (of course) at most n bytes of argument text (excluding terminators).

Solution 5

Here's a solution for both Windows and Unix (tested on Linux, OSX and Windows). Tested with Valgrind and Dr. Memory.

It uses wordexp for POSIX systems, and CommandLineToArgvW for Windows.

Note that for the Windows solution, most of the code is converting between char ** and wchar_t ** with the beautiful Win32 API, since there is no CommandLineToArgvA available (ANSI-version).

#ifdef _WIN32
#include <windows.h>
#else
#include <wordexp.h>
#endif

char **split_commandline(const char *cmdline, int *argc)
{
    int i;
    char **argv = NULL;
    assert(argc);

    if (!cmdline)
    {
        return NULL;
    }

    // Posix.
    #ifndef _WIN32
    {
        wordexp_t p;

        // Note! This expands shell variables.
        if (wordexp(cmdline, &p, 0))
        {
            return NULL;
        }

        *argc = p.we_wordc;

        if (!(argv = calloc(*argc, sizeof(char *))))
        {
            goto fail;
        }

        for (i = 0; i < p.we_wordc; i++)
        {
            if (!(argv[i] = strdup(p.we_wordv[i])))
            {
                goto fail;
            }
        }

        wordfree(&p);

        return argv;
    fail:
        wordfree(&p);
    }
    #else // WIN32
    {
        wchar_t **wargs = NULL;
        size_t needed = 0;
        wchar_t *cmdlinew = NULL;
        size_t len = strlen(cmdline) + 1;

        if (!(cmdlinew = calloc(len, sizeof(wchar_t))))
            goto fail;

        if (!MultiByteToWideChar(CP_ACP, 0, cmdline, -1, cmdlinew, len))
            goto fail;

        if (!(wargs = CommandLineToArgvW(cmdlinew, argc)))
            goto fail;

        if (!(argv = calloc(*argc, sizeof(char *))))
            goto fail;

        // Convert from wchar_t * to ANSI char *
        for (i = 0; i < *argc; i++)
        {
            // Get the size needed for the target buffer.
            // CP_ACP = Ansi Codepage.
            needed = WideCharToMultiByte(CP_ACP, 0, wargs[i], -1,
                                        NULL, 0, NULL, NULL);

            if (!(argv[i] = malloc(needed)))
                goto fail;

            // Do the conversion.
            needed = WideCharToMultiByte(CP_ACP, 0, wargs[i], -1,
                                        argv[i], needed, NULL, NULL);
        }

        if (wargs) LocalFree(wargs);
        if (cmdlinew) free(cmdlinew);
        return argv;

    fail:
        if (wargs) LocalFree(wargs);
        if (cmdlinew) free(cmdlinew);
    }
    #endif // WIN32

    if (argv)
    {
        for (i = 0; i < *argc; i++)
        {
            if (argv[i])
            {
                free(argv[i]);
            }
        }

        free(argv);
    }

    return NULL;
}
Share:
58,900
codebox
Author by

codebox

Updated on July 27, 2022

Comments

  • codebox
    codebox almost 2 years

    Is there a way in C to parse a piece of text and obtain values for argv and argc, as if the text had been passed to an application on the command line?

    This doesn't have to work on Windows, just Linux - I also don't care about quoting of arguments.

  • Remo.D
    Remo.D over 14 years
    With the small problem that is C++ and not C :)
  • bua
    bua over 14 years
    You're right, I've post it because when I was looking at sources some time ago, I remember it was generic, OOD free code, it looked almost like C. But I think its worth to keep this here.
  • Michael Burr
    Michael Burr over 14 years
    Rename the file to argcargv.c and it's C. Literally.
  • Remo.D
    Remo.D over 14 years
    I like the brevity of your solution but I'm not a big fan of strtok() or strdupa(). I'm also not very clear on what the strdup("test") is for. The major drawback to me seems the fact that you have many strdup and, hence, you will have to do many free() when done. I posted an alternative version in my answer, just in case it may be useful for somebody.
  • Remo.D
    Remo.D over 14 years
    because getopt does a different job. It takes an array of arguments and look for options into it. This question is about splitting a string of "arguments" into an array of char * which is something that getopt is not able to do
  • Admin
    Admin about 12 years
    If you transform input string like that you can't do string concatenation with quotes" like "this' or 'this. See my answer for a full featured solution.
  • Exectron
    Exectron over 11 years
    That may do more than you want. E.g. it does shell word expansions including environment variable substitution, e.g. it substituting $PATH with the current path.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 11 years
    I guess it depends on what you mean by parse into argc/argv; certainly that involves some of what the shell does (processing quoting), but variable expansion and other things are more questionable. BTW wordexp does have an option to disable command expansion.
  • Exectron
    Exectron over 11 years
    If you mean WRDE_NOCMD, that doesn't seem to prevent expansion of $PATH, nor expanding * to the names of files in the current directory.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 11 years
    I didn't say it prevented variable expansion, just that one other thing you might want to turn off, command expansion, can be turned off.
  • Steve Valliere
    Steve Valliere about 11 years
    Mr Peitrek's library appears to be very weak when compared to Microsoft's actual rules for separating a command line into argc/argv (see msdn.microsoft.com/en-us/library/17w5ykft.aspx for their rules.) He doesn't appear to handle embedded quoted strings, multiple backslashes or even escaped quote characters. Not a problem if that's not needed, of course, but folks should be sure they get what they need!
  • jrr
    jrr almost 11 years
    Thanks, that saved some time. To anyone else using this: "char* p1" (though your compiler would have told you =] )
  • Telemachus
    Telemachus over 10 years
    @Remo.D I know it's a long time ago, but I was working on this same general problem myself and about to use strtok. It seems designed for just such a case. So, I'm curious: Why are you "not a big fan of strtok()"?
  • Max Truxa
    Max Truxa over 9 years
    (nit-picking ahead) Note that there is one small thing missing to be compliant with the standard argc/argv layout: The entry behind the last valid one in argv is always set to NULL ("foo bar": argv[0] -> "foo", argv[1] -> "bar", argv[2] -> NULL).
  • Joakim
    Joakim over 9 years
    Also, it's totally unnecessary since Microsoft doesn't just give you the specification how they parse the command line, they also provide an API for this: CommandLineToArgvW
  • Jesse Chisholm
    Jesse Chisholm over 8 years
    To be a bit closer to the standard argv, add an extra position at the end with NULL. This is done in case a programmer ignores argc and just while(process(*++argv)); until they hit that NULL. There would, of course, need to be more to handle quoted arguments (and escaped quotes).
  • Jesse Chisholm
    Jesse Chisholm over 8 years
    @Telemachus - strtok 1: modifies the buffer it parses, 2: remembers your buffer across calls, which makes it 3: not thread safe as it is not re-entrant. It is not the designed purpose of strtok but the designed in side effects that are annoying. :) :) :)
  • domsson
    domsson over 6 years
    This is exactly what I was looking for and seems to work very well. I needed it to pass a user-defined command to posix_spawn, not knowing whether there would be additional arguments. However, a short code example would make this answer so much better. Yeah, even now, more than seven years later. :-)
  • Zibri
    Zibri over 5 years
    nice but it does not handle quotes nor double quotes
  • Zibri
    Zibri over 5 years
    Note: it destroys the source string (because it uses strtok) keep it in mind and if needed add a strdup
  • Lê Quang Duy
    Lê Quang Duy about 4 years
    I ran your code, but it fails if I input "1 2 3 \'3 4\"567\' \"bol\'obala\" 2x2=\"foo\""
  • liuyang1
    liuyang1 about 4 years
    your buf is strdup from args. It's memory leaked.
  • Zibri
    Zibri almost 4 years
    @LêQuangDuy feel free to modify it to suit your needs and post here your better solution ;)
  • Greedo
    Greedo over 2 years
    Does this account for escaped args like "some/long path/to file.txt"?
  • sstteevvee
    sstteevvee over 2 years
    No, you'd have to look for and handle quotes yourself after. If your code is running on a "real" OS, then I'd recommend seeing what it offers. For example, the glib one as suggested in another solution, which should give you those sorts of features.