Using getopts to process long and short command line options

483,225

Solution 1

There are three implementations that may be considered:

  • Bash builtin getopts. This does not support long option names with the double-dash prefix. It only supports single-character options.

  • BSD UNIX implementation of standalone getopt command (which is what MacOS uses). This does not support long options either.

  • GNU implementation of standalone getopt. GNU getopt(3) (used by the command-line getopt(1) on Linux) supports parsing long options.


Some other answers show a solution for using the bash builtin getopts to mimic long options. That solution actually makes a short option whose character is "-". So you get "--" as the flag. Then anything following that becomes OPTARG, and you test the OPTARG with a nested case.

This is clever, but it comes with caveats:

  • getopts can't enforce the opt spec. It can't return errors if the user supplies an invalid option. You have to do your own error-checking as you parse OPTARG.
  • OPTARG is used for the long option name, which complicates usage when your long option itself has an argument. You end up having to code that yourself as an additional case.

So while it is possible to write more code to work around the lack of support for long options, this is a lot more work and partially defeats the purpose of using a getopt parser to simplify your code.

Solution 2

getopt and getopts are different beasts, and people seem to have a bit of misunderstanding of what they do. getopts is a built-in command to bash to process command-line options in a loop and assign each found option and value in turn to built-in variables, so you can further process them. getopt, however, is an external utility program, and it doesn't actually process your options for you the way that e.g. bash getopts, the Perl Getopt module or the Python optparse/argparse modules do. All that getopt does is canonicalize the options that are passed in — i.e. convert them to a more standard form, so that it's easier for a shell script to process them. For example, an application of getopt might convert the following:

myscript -ab infile.txt -ooutfile.txt

into this:

myscript -a -b -o outfile.txt infile.txt

You have to do the actual processing yourself. You don't have to use getopt at all if you make various restrictions on the way you can specify options:

  • only put one option per argument;
  • all options go before any positional parameters (i.e. non-option arguments);
  • for options with values (e.g. -o above), the value has to go as a separate argument (after a space).

Why use getopt instead of getopts? The basic reason is that only GNU getopt gives you support for long-named command-line options.1 (GNU getopt is the default on Linux. Mac OS X and FreeBSD come with a basic and not-very-useful getopt, but the GNU version can be installed; see below.)

For example, here's an example of using GNU getopt, from a script of mine called javawrap:

# NOTE: This requires GNU getopt.  On Mac OS X and FreeBSD, you have to install this
# separately; see below.
TEMP=$(getopt -o vdm: --long verbose,debug,memory:,debugfile:,minheap:,maxheap: \
              -n 'javawrap' -- "$@")

if [ $? != 0 ] ; then echo "Terminating..." >&2 ; exit 1 ; fi

# Note the quotes around '$TEMP': they are essential!
eval set -- "$TEMP"

VERBOSE=false
DEBUG=false
MEMORY=
DEBUGFILE=
JAVA_MISC_OPT=
while true; do
  case "$1" in
    -v | --verbose ) VERBOSE=true; shift ;;
    -d | --debug ) DEBUG=true; shift ;;
    -m | --memory ) MEMORY="$2"; shift 2 ;;
    --debugfile ) DEBUGFILE="$2"; shift 2 ;;
    --minheap )
      JAVA_MISC_OPT="$JAVA_MISC_OPT -XX:MinHeapFreeRatio=$2"; shift 2 ;;
    --maxheap )
      JAVA_MISC_OPT="$JAVA_MISC_OPT -XX:MaxHeapFreeRatio=$2"; shift 2 ;;
    -- ) shift; break ;;
    * ) break ;;
  esac
done

This lets you specify options like --verbose -dm4096 --minh=20 --maxhe 40 --debugfi="/Users/John Johnson/debug.txt" or similar. The effect of the call to getopt is to canonicalize the options to --verbose -d -m 4096 --minheap 20 --maxheap 40 --debugfile "/Users/John Johnson/debug.txt" so that you can more easily process them. The quoting around "$1" and "$2" is important as it ensures that arguments with spaces in them get handled properly.

If you delete the first 9 lines (everything up through the eval set line), the code will still work! However, your code will be much pickier in what sorts of options it accepts: In particular, you'll have to specify all options in the "canonical" form described above. With the use of getopt, however, you can group single-letter options, use shorter non-ambiguous forms of long-options, use either the --file foo.txt or --file=foo.txt style, use either the -m 4096 or -m4096 style, mix options and non-options in any order, etc. getopt also outputs an error message if unrecognized or ambiguous options are found.

NOTE: There are actually two totally different versions of getopt, basic getopt and GNU getopt, with different features and different calling conventions.2 Basic getopt is quite broken: Not only does it not handle long options, it also can't even handle embedded spaces inside of arguments or empty arguments, whereas getopts does do this right. The above code will not work in basic getopt. GNU getopt is installed by default on Linux, but on Mac OS X and FreeBSD it needs to be installed separately. On Mac OS X, install MacPorts (http://www.macports.org) and then do sudo port install getopt to install GNU getopt (usually into /opt/local/bin), and make sure that /opt/local/bin is in your shell path ahead of /usr/bin. On FreeBSD, install misc/getopt.

A quick guide to modifying the example code for your own program: Of the first few lines, all is "boilerplate" that should stay the same, except the line that calls getopt. You should change the program name after -n, specify short options after -o, and long options after --long. Put a colon after options that take a value.

Finally, if you see code that has just set instead of eval set, it was written for BSD getopt. You should change it to use the eval set style, which works fine with both versions of getopt, while the plain set doesn't work right with GNU getopt.

1Actually, getopts in ksh93 supports long-named options, but this shell isn't used as often as bash. In zsh, use zparseopts to get this functionality.

2Technically, "GNU getopt" is a misnomer; this version was actually written for Linux rather than the GNU project. However, it follows all the GNU conventions, and the term "GNU getopt" is commonly used (e.g. on FreeBSD).

Solution 3

The Bash builtin getopts function can be used to parse long options by putting a dash character followed by a colon into the optspec:

#!/usr/bin/env bash 
optspec=":hv-:"
while getopts "$optspec" optchar; do
    case "${optchar}" in
        -)
            case "${OPTARG}" in
                loglevel)
                    val="${!OPTIND}"; OPTIND=$(( $OPTIND + 1 ))
                    echo "Parsing option: '--${OPTARG}', value: '${val}'" >&2;
                    ;;
                loglevel=*)
                    val=${OPTARG#*=}
                    opt=${OPTARG%=$val}
                    echo "Parsing option: '--${opt}', value: '${val}'" >&2
                    ;;
                *)
                    if [ "$OPTERR" = 1 ] && [ "${optspec:0:1}" != ":" ]; then
                        echo "Unknown option --${OPTARG}" >&2
                    fi
                    ;;
            esac;;
        h)
            echo "usage: $0 [-v] [--loglevel[=]<value>]" >&2
            exit 2
            ;;
        v)
            echo "Parsing option: '-${optchar}'" >&2
            ;;
        *)
            if [ "$OPTERR" != 1 ] || [ "${optspec:0:1}" = ":" ]; then
                echo "Non-option argument: '-${OPTARG}'" >&2
            fi
            ;;
    esac
done

After copying to executable file name=getopts_test.sh in the current working directory, one can produce output like

$ ./getopts_test.sh
$ ./getopts_test.sh -f
Non-option argument: '-f'
$ ./getopts_test.sh -h
usage: code/getopts_test.sh [-v] [--loglevel[=]<value>]
$ ./getopts_test.sh --help
$ ./getopts_test.sh -v
Parsing option: '-v'
$ ./getopts_test.sh --very-bad
$ ./getopts_test.sh --loglevel
Parsing option: '--loglevel', value: ''
$ ./getopts_test.sh --loglevel 11
Parsing option: '--loglevel', value: '11'
$ ./getopts_test.sh --loglevel=11
Parsing option: '--loglevel', value: '11'

Obviously getopts neither performs OPTERR checking nor option-argument parsing for the long options. The script fragment above shows how this may be done manually. The basic principle also works in the Debian Almquist shell ("dash"). Note the special case:

getopts -- "-:"  ## without the option terminator "-- " bash complains about "-:"
getopts "-:"     ## this works in the Debian Almquist shell ("dash")

Note that, as GreyCat from over at http://mywiki.wooledge.org/BashFAQ points out, this trick exploits a non-standard behaviour of the shell which permits the option-argument (i.e. the filename in "-f filename") to be concatenated to the option (as in "-ffilename"). The POSIX standard says there must be a space between them, which in the case of "-- longoption" would terminate the option-parsing and turn all longoptions into non-option arguments.

Solution 4

The built-in getopts command is still, AFAIK, limited to single-character options only.

There is (or used to be) an external program getopt that would reorganize a set of options such that it was easier to parse. You could adapt that design to handle long options too. Example usage:

aflag=no
bflag=no
flist=""
set -- $(getopt abf: "$@")
while [ $# -gt 0 ]
do
    case "$1" in
    (-a) aflag=yes;;
    (-b) bflag=yes;;
    (-f) flist="$flist $2"; shift;;
    (--) shift; break;;
    (-*) echo "$0: error - unrecognized option $1" 1>&2; exit 1;;
    (*)  break;;
    esac
    shift
done

# Process remaining non-option arguments
...

You could use a similar scheme with a getoptlong command.

Note that the fundamental weakness with the external getopt program is the difficulty of handling arguments with spaces in them, and in preserving those spaces accurately. This is why the built-in getopts is superior, albeit limited by the fact it only handles single-letter options.

Solution 5

Long options can be parsed by the standard getopts builtin as “arguments” to the - “option”

This is portable and native POSIX shell – no external programs or bashisms are needed.

This guide implements long options as arguments to the - option, so --alpha is seen by getopts as - with argument alpha and --bravo=foo is seen as - with argument bravo=foo. The true argument can be harvested with a simple replacement: ${OPTARG#*=}.

In this example, -b and -c (and their long forms, --bravo and --charlie) have mandatory arguments. Arguments to long options come after equals signs, e.g. --bravo=foo (space delimiters for long options would be hard to implement, see below).

Because this uses the getopts builtin, this solution supports usage like cmd --bravo=foo -ac FILE (which has combined options -a and -c and interleaves long options with standard options) while most other answers here either struggle or fail to do that.

die() { echo "$*" >&2; exit 2; }  # complain to STDERR and exit with error
needs_arg() { if [ -z "$OPTARG" ]; then die "No arg for --$OPT option"; fi; }

while getopts ab:c:-: OPT; do
  # support long options: https://stackoverflow.com/a/28466267/519360
  if [ "$OPT" = "-" ]; then   # long option: reformulate OPT and OPTARG
    OPT="${OPTARG%%=*}"       # extract long option name
    OPTARG="${OPTARG#$OPT}"   # extract long option argument (may be empty)
    OPTARG="${OPTARG#=}"      # if long option argument, remove assigning `=`
  fi
  case "$OPT" in
    a | alpha )    alpha=true ;;
    b | bravo )    needs_arg; bravo="$OPTARG" ;;
    c | charlie )  needs_arg; charlie="$OPTARG" ;;
    ??* )          die "Illegal option --$OPT" ;;  # bad long option
    ? )            exit 2 ;;  # bad short option (error reported via getopts)
  esac
done
shift $((OPTIND-1)) # remove parsed options and args from $@ list

When the option is a dash (-), it is a long option. getopts will have parsed the actual long option into $OPTARG, e.g. --bravo=foo originally sets OPT='-' and OPTARG='bravo=foo'. The if stanza sets $OPT to the contents of $OPTARG before the first equals sign (bravo in our example) and then removes that from the beginning of $OPTARG (yielding =foo in this step, or an empty string if there is no =). Finally, we strip the argument's leading =. At this point, $OPT is either a short option (one character) or a long option (2+ characters).

The case then matches either short or long options. For short options, getopts automatically complains about options and missing arguments, so we have to replicate those manually using the needs_arg function, which fatally exits when $OPTARG is empty. The ??* condition will match any remaining long option (? matches a single character and * matches zero or more, so ??* matches 2+ characters), allowing us to issue the "Illegal option" error before exiting.


Minor bug: If somebody gives an invalid single-character long option (and it's not also a short option), this will exit with an error but without a message. This is because this implementation assumes it was a short option. You could track that with an extra variable in the long option stanza preceding the case and then test for it in the final case condition, but I consider that too much of a corner case to bother.

Uppercase variable names: Generally, the advice is to reserve all-uppercase variables for system use. I'm keeping $OPT as all-uppercase to keep it in line with $OPTARG, but this does break that convention. I think it fits because this is something the system should have done, and it should be safe because there are no standards (afaik) that use such a variable.)

To complain about unexpected arguments to long options: Mimic needs_arg with a flipped test to complain about an argument when one isn't expected:

no_arg() { if [ -n "$OPTARG" ]; then die "No arg allowed for --$OPT option"; fi; }

To accept long options with space-delimited arguments: You can pull in the next argument with eval "ARG_B=\"\$$OPTIND\"" (or with bash's indirect expansion, ARG_B="${!OPTIND}") and then increment $OPTIND as noted in an older version of this answer, but it's unreliable; getopts can prematurely terminate on the assumption that the argument was beyond its scope and some implementations don't take well to manual manipulations of $OPTIND.

Share:
483,225
gagneet
Author by

gagneet

software geek, with a penchant for breaking code, working on internet advertisement development and deployment domain. recently shifted to aussie, working on automation of testing scenarios.

Updated on November 16, 2021

Comments

  • gagneet
    gagneet over 2 years

    I wish to have long and short forms of command line options invoked using my shell script.

    I know that getopts can be used, but like in Perl, I have not been able to do the same with shell.

    Any ideas on how this can be done, so that I can use options like:

    ./shell.sh --copyfile abc.pl /tmp/
    ./shell.sh -c abc.pl /tmp/
    

    In the above, both the commands mean the same thing to my shell, but using getopts, I have not been able to implement these?