Why does argv include the program name?

28,265

Solution 1

To begin with, note that argv[0] is not necessarily the program name. It is what the caller puts into argv[0] of the execve system call (e.g. see this question on Stack Overflow). (All other variants of exec are not system calls but interfaces to execve.)

Suppose, for instance, the following (using execl):

execl("/var/tmp/mybackdoor", "top", NULL);

/var/tmp/mybackdoor is what is executed but argv[0] is set to top, and this is what ps or (the real) top would display. See this answer on U&L SE for more on this.

Setting all of this aside: Before the advent of fancy filesystems like /proc, argv[0] was the only way for a process to learn about its own name. What would that be good for?

  • Several programs customize their behavior depending on the name by which they were called (usually by symbolic or hard links, for example BusyBox's utilities; several more examples are provided in other answers to this question).
  • Moreover, services, daemons and other programs that log through syslog often prepend their name to the log entries; without this, event tracking would become next to infeasible.

Solution 2

Plenty:

  • Bash runs in POSIX mode when argv[0] is sh. It runs as a login shell when argv[0] begins with -.
  • Vim behaves differently when run as vi, view, evim, eview, ex, vimdiff, etc.
  • Busybox, as already mentioned.
  • In systems with systemd as init, shutdown, reboot, etc. are symlinks to systemctl.
  • and so on.

Solution 3

Historically, argv is just an array of pointers to the "words" of the commandline, so it makes sense to start with the first "word", which happens to be the name of the program.

And there's quite a few programs that behave differently according to which name is used to call them, so you can just create different links to them and get different "commands". The most extreme example I can think of is busybox, which acts like several dozen different "commands" depending on how it is called.

Edit: References for Unix 1st edition, as requested

One can see e.g. from the main function of cc that argc and argv were already used. The shell copies arguments to the parbuf inside the newarg part of the loop, while treating the command itself in the same way as the arguments. (Of course, later on it executes only the first argument, which is the name of the command). It looks like execv and relatives didn't exist then.

Solution 4

In addition to programs altering their behaviour depending on how they were called, I find argv[0] useful in printing the usage of a program, like so:

printf("Usage: %s [arguments]\n", argv[0]);

This causes the usage message to always use the name through which it was called. If the program is renamed, its usage message changes with it. It even includes the path name it was called with:

# cat foo.c 
#include <stdio.h>
int main(int argc, char **argv) { printf("Usage: %s [arguments]\n", argv[0]); }
# gcc -Wall -o foo foo.c
# mv foo /usr/bin 
# cd /usr/bin 
# ln -s foo bar
# foo
Usage: foo [arguments]
# bar
Usage: bar [arguments]
# ./foo
Usage: ./foo [arguments]
# /usr/bin/foo
Usage: /usr/bin/foo [arguments]

It's a nice touch, especially for small special-purpose tools/scripts that might live all over the place.

This seems common practice in GNU tools as well, see ls for example:

% ls --qq
ls: unrecognized option '--qq'
Try 'ls --help' for more information.
% /bin/ls --qq
/bin/ls: unrecognized option '--qq'
Try '/bin/ls --help' for more information.

Solution 5

Use cases:

You can use the program name to change the program behavior.

For example you could create some symlinks to the actual binary.

One famous example where this technique is used is the busybox project which installs only one single binary and many symlinks to it. (ls, cp, mv, etc). They are doing it to save storage space because their targets are small embedded devices.

This is also used in setarch from util-linux:

$ ls -l /usr/bin/ | grep setarch
lrwxrwxrwx 1 root root           7 2015-11-05 02:15 i386 -> setarch
lrwxrwxrwx 1 root root           7 2015-11-05 02:15 linux32 -> setarch
lrwxrwxrwx 1 root root           7 2015-11-05 02:15 linux64 -> setarch
-rwxr-xr-x 1 root root       14680 2015-10-22 16:54 setarch
lrwxrwxrwx 1 root root           7 2015-11-05 02:15 x86_64 -> setarch

Here they are using this technique basically to avoid many duplicate source files or just to keep the sources more readable.

Another use case would be a program which needs to load some modules or data at runtime. Having the program path makes you able to load modules from a path relative to the program location.

Moreover many programs print error messages including the program name.

Why:

  1. Because it's POSIX convention (man 3p execve):

argv is an array of argument strings passed to the new program. By convention, the first of these strings should contain the filename associated with the file being executed.

  1. It's C standard (at least C99 and C11):

If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment.

Note the C Standard says "program name" not "filename".

Share:
28,265

Related videos on Youtube

Shrikant Giridhar
Author by

Shrikant Giridhar

Updated on September 18, 2022

Comments

  • Shrikant Giridhar
    Shrikant Giridhar almost 2 years

    Typical Unix/Linux programs accept the command line inputs as an argument count (int argc) and an argument vector (char *argv[]). The first element of argv is the program name - followed by the actual arguments.

    Why is the program name passed to the executable as an argument? Are there any examples of programs using their own name (maybe some kind of exec situation)?

    • Арсений Черенков
      Арсений Черенков over 7 years
      like mv and cp ?
    • Motte001
      Motte001 over 7 years
      On Debian sh is symlink to dash. They behave different, when called as sh or as dash
    • Alexej Magura
      Alexej Magura over 7 years
      @Archemar I don't think that mv and cp are symlinks, at least on CentOS 6.4 they aren't.
    • Baard Kopperud
      Baard Kopperud over 7 years
      @AlexejMagura If you use something like busybox (common on rescue-discs and such), then pretty much everything (cp, mv, rm, ls, ...) is a symbolic link to busybox.
    • wizzwizz4
      wizzwizz4 over 7 years
      I'm finding this really hard to ignore, so I'll say it: you probably mean "GNU" programs (gcc, bash, gunzip, most of the rest of the OS...), as Linux is just the kernel.
    • Sam Hobbs
      Sam Hobbs over 7 years
      You need to ask Dennis Ritchie, the original designer of Unix and C (Unix was designed by others as well but the C language was designed by Dennis Ritchie). In the beginning, the design of C was not rigidly determined and sometimes is the result of Dennis Ritchie's personal preferences.
    • drHogan
      drHogan over 7 years
      @wizzwizz4 What's wrong with "Typical Unix/Linux programs"? I read it like "Typical programs running on Unix/Linux". That's much better than your restriction to certain GNU programs. Dennis Ritchie was certainly not using any GNU programs. BTW the Hurd kernel is an example of a GNU program which does not have a main function...
    • OJFord
      OJFord over 7 years
      That's incredibly pedantic @wizzwizz4 - people say "Windows programs" or "macOS apps" all the time; nobody assumes they're referring only to first-party bundlewares.
    • wizzwizz4
      wizzwizz4 over 7 years
      @rudimeier Linux is one program. GNU is the collective name of the majority of the rest of the programs. If Linux hadn't come along before Hurd was finished, you probably wouldn't even have heard of Linux. Android is based on Linux, but you don't see that on this website because it's not Unix-like. GNU is Unix-like, but Linux can't even be compared to it. I'll stop this argument now, else it will go on for months. If you want to carry on we should take it to chat, but I recommend we don't.
    • wizzwizz4
      wizzwizz4 over 7 years
      @OllieFord But people don't say "XNU apps", do they? People also don't attribute the creation of the majority of Windows to the creators of ntoskrnl, and don't assume that they could replace explorer, cmd, dwm, dasHost , lsass, wininit etc. and still have Windows, so long as they kept ntoskrnl.
    • Lucas
      Lucas over 7 years
      Why does argv include the program name Why shouldn't it?
    • Lucas
      Lucas over 7 years
      @wizzwizz4 Right, Linux is a kernel, not an OS. Though most people mean GNU/Linux when they say "Linux" in the context of OS. It's a colloquialism, and we're all aware of Stallman's meme
    • wizzwizz4
      wizzwizz4 over 7 years
      @BradenBest Unfortunately, we are not all aware that "Linux" is often an abbreviation of GNU/Linux, especially in the context of OS. I personally think that "*nix" and "Unix-like" are better phrases than "Unix/Linux", not only because Linux isn't anything like Unix but also because it entirely discounts FreeBSD and other Unix-like operating systems. But each to their own, I suppose.
    • Lucas
      Lucas over 7 years
      @wizzwizz4 I agree, you should inform people when it's necessary, as a user ought to have a good understanding of the software they're using. But you shouldn't force them to use terminology they don't want to use. Nobody calls Windows 7 "NT6.1/ntoskrnl", or OS X Yosemite "OSX10.10/XNU", just as not many people call Linux "GNU/Linux". Together, GNU and Linux make up a nifty OS that is grater than the sum of its parts. We just happen to call this combination "Linux". In effect, there's "Linux" the kernel, and "Linux" the family of OS. As long as that's understood, there shouldn't be problems.
    • Lucas
      Lucas over 7 years
      Basically, and this is in counter to the point you made about "XNU apps" and "ntoskrnl explorer", "Linux" isn't meant as the name of the kernel (instead we say "Linux kernel"); it's meant as the name of the OS that is formed when you combine Linux and GNU. Just as Windows is the name of the OS that is combined when you combine the Windows user-land software with ntoskrnl.exe, and OS X is the OS you get when you combine the Mac OS user-land with XNU. They could have called OS X "Apple XNU", and Microsoft could have called Windows 7 "Microsoft NT 6.1"... but they didn't.
    • Lucas
      Lucas over 7 years
      @wizzwizz4 "Don't flood the comments", I agree. Some final words: I agree that *nix is a good name for all unix-like systems and I even use it myself sometimes. I also wish Linux (the OS family) had a different name, as I recognize how strange it is to name the OS after one or two of its parts (both Linux and GNU/Linux are idiosyncratic). Something like, I dunno, Penguix, Tuxix, GNuxix, PenGNUx, PenGNUix. "My favorite system is Arch Pengnuix". Eh, that's really hard to pronounce, I keep slurring it as "Pain-wicks"...huh
    • wizzwizz4
      wizzwizz4 over 7 years
      @BradenBest They were thinking of LiGNUx, but both parties agreed that it sounded rather rubbish. Deleting earlier comment to make room for this one.
    • Ciro Santilli Путлер Капут 六四事
      Ciro Santilli Путлер Капут 六四事 over 5 years
  • user541686
    user541686 over 7 years
    Doesn't this break if you reach the symlink from another symlink?
  • drHogan
    drHogan over 7 years
    @Mehrdad, Yes that's the downside and can be confusing for the user.
  • Ruslan
    Ruslan over 7 years
    Examples of such programs are bunzip2, bzcat and bzip2, for which first two are symlinks to the third one.
  • drHogan
    drHogan over 7 years
    @Ruslan Interestingly zcat is not a symlink. They seem to avoid the downsides of this technique using a shell script instead. But they fail to print a complete --help output because somebody who added options to gzip forgot to maintain zcat too.
  • Shadur
    Shadur over 7 years
    Another one is sendmail and mail. Every single unix MTA comes with a symlink for those two commands, and is designed to emulate the original's behaviour when called as such, meaning that any unix program that needs to send mail knows exactly how they can do so.
  • Lesmana
    Lesmana over 7 years
    please add references that back this up.
  • Admin
    Admin over 7 years
    For as long as I can remember, the GNU coding standards have discouraged the use of argv[0] to change program behavior (section "Standards for Interfaces Generally" in the current version). gunzip is a historical exception.
  • Pepijn Schmitz
    Pepijn Schmitz over 7 years
    busybox is another excellent example. It can be called by 308 different names to invoke different commands: busybox.net/downloads/BusyBox.html#commands
  • Giacomo Catenazzi
    Giacomo Catenazzi over 7 years
    an other common case: test and [: when you call the former, it handles an error if the last argument is ]. (on actual Debian stable these commands are two different programs, but previous versions and MacOs still uses the same program). And tex, latex and so on: the binary is the same, but looking how it was called, it choose the proper configuration file. init is similar.
  • The Vee
    The Vee over 7 years
    +1. I was going to suggest the same. Strange that so many people focus on changing behaviour and fail to mention probably the most obvious and much more widespread usage.
  • chepner
    chepner over 7 years
    Related, [ considers it an error if the last argument is not ].
  • ninjalj
    ninjalj over 7 years
    From a quick skimming, exec takes the name of the command to execute and a zero-terminated array of char pointers (best seen at minnie.tuhs.org/cgi-bin/utree.pl?file=V1/u0.s, where exec takes references to label 2 and label 1, and at label 2: appears etc/init\0, and at label 1: appears a reference to label 2, and a terminating zero), which is basically what execve does today minus envp.
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 7 years
    execv and execl have existed "forever" (i.e., since the early to mid 1970s) — execv was a system call and execl was a library function that called it.   execve didn't exist then because the environment didn't exist then.   The other members of the family were added later.
  • dirkt
    dirkt over 7 years
    @G-Man Can you point me to execv in the v1 source I linked? Just curious.
  • einpoklum
    einpoklum over 7 years
    @rudimeier: Your 'Why' items are not really reasons, they're just a "homunculus", i.e. it just begs the question of why do the standard require this to be the case.
  • spectras
    spectras over 7 years
    Many, many more programs also inject their argv[0] in their usage/help output instead of hard-coding their name. Some in full, some just the basename.
  • drHogan
    drHogan over 7 years
    @einpoklum OP's question was: Why is the program name passed to the executable? I answered: Because POSIX and C standard tells us to do so. How you think that's not really a reason? If the docs I've quoted would not exist then probably many programs would not pass the program name.
  • moopet
    moopet over 7 years
    That's not a particularly good explanation - there's no reason we couldn't have standardised on something like (char *path_to_program, char **argv, int argc) for example
  • Larry Hosken
    Larry Hosken over 7 years
    Afaik, most programs pull configuration from a standard location (~/.<program>, /etc/<program, $XDG_CONFIG_HOME) and either take a parameter to change it or have a compile-time option that bakes in a constant to the binary.
  • mckenzm
    mckenzm over 7 years
    more importantly, programs can behave differently depending upon the name they were invoked under. wdel and wput. Busybox is a good example, and it has been a past (possibly unwise ) usage to provide security backdoors with a high argc use, and a particular name.
  • Joey
    Joey over 7 years
    I guess this answers the second question, but not the first. I very much doubt some OS designer sat down and said »Hey, it would be cool if I had the same program doing different things just based on its executable name. I guess I'll include the name in its argument array, then.«
  • muru
    muru over 7 years
    @Joey Yes, the wording is intended to convey that (Q: "Are there any ...?" A: "Plenty: ...")