Matching numbers with regex in case statement

17,899

Solution 1

case does not use regexes, it uses patterns

For "1 or more digits", do this:

shopt -s extglob
...
    case ${!i} in
        +([[:digit:]]) )
            n=${!i}
            ;;
    ...

If you want to use regular expressions, use the =~ operator within [[...]]

if [[ ${!i} =~ ^[[:digit:]]+$ ]]; then
    n=${!i}
else
    echo "Invalid"
fi

Solution 2

As glenn says, “case does not use regexes, it uses patterns”.  As bash(1) says,

case word in [ [(] pattern [ | pattern ] ... ) list ;; ] ... esac

        A case command first expands word, and tries to match it against each pattern in turn, using the same matching rules as for pathname expansion (see Pathname Expansion below).

Similarly, the POSIX specification says,

… each pattern … shall be compared against the expansion of word, according to the rules described in Pattern Matching Notation …

So the patterns are pathname expansion patterns, a.k.a. wildcards, a.k.a. globs, as in ls -l -- *.sh or rm -- *.bak.

Sure, shopt -s extglob and [[ … =~ … ]] are the neatest thing since sliced bread, but they aren’t POSIX, and it can be useful to know how to use the original tools.  For years, programmers checked, for example, whether a string was a number by checking whether it was not not a number.  You’ve defined a number to be a string that consists (entirely) of one or more digits.  So a string is not a number if it is null, or if it contains a character that is not a digit.  We can test these conditions with a case statement as follows:

case "$1" in
    ("")
        # null
           ︙
        ;;
    (*[!0-9]*)
        # contains non-numeric character(s)
           ︙
        ;;
    (*)
        # is a whole number (non-negative integer)
           ︙
esac

where [!0-9] is the old-timey shell way of saying [^0-9], which, of course, means any character other than a digit.  ([!…] and [^…] both work in bash.  [!…] is required to work by POSIX; the result of [^…] is unspecified.)  If you don’t care which kind of non-number a string is, you can combine the non-number patterns:

case "$1" in
    ("" | *[!0-9]*)
        # not a number
           ︙
        ;;
    (*)
        # is a number
           ︙
esac

As an exercise, here’s a case statement to handle any kind of real number — to be precise, a string of one or more digits, with optionally a period (.) somewhere, and optionally a minus sign (-) at the beginning.

case "$1" in
    (*[!-.0-9]*)
        # contains non-numeric character(s)
        ;;
    (*?-*)
        # contains '-' somewhere other than the first position
        ;;
    (*.*.*)
        # contains multiple decimal points
        ;;
    (*)
        case "$1" in
            (*[0-9]*)
                # is a real number
                ;;
            (*)
                # not a number
        esac
esac

I added the case-within-a-case to verify that the string does, indeed, contain at least one digit.  That wasn’t necessary in the integer example because I tested whether the string was null; a test which I have removed from this statement.  Without the second case, a single - or a single . — or even -. — would qualify as a number.  Of course we could add patterns to handle those exceptions, but that can get complex.  (For example, I almost posted this answer without realizing that -. was one of the exceptions.)  I believe that the above approach is more flexible and robust.

Of course the non-number patterns can be combined here, too: (*[!-.0-9]* | *?-* | *.*.*).

Solution 3

To match numbers with regexp in case statements, you'd need a shell whose wildcards support regexps. I only know of ksh93 with those.

With ksh93 globs, you can do ~(E)^[0-9]+$ or ~(E:^[0-9]+$) to use an Extended regexp in a glob pattern, or ~(P)^\d+$ to use a perl-like regexp (also G for basic regexp, X for augmented regexp, V for SysV regexp).

So:

#! /bin/ksh93
for i do
  case $i in
    (~(E)^[0-9]+$)
      n=$i;;
    (*)
      echo >&2 'Invalid argument!'
      usage
  esac
done
Share:
17,899

Related videos on Youtube

siery
Author by

siery

Updated on September 18, 2022

Comments

  • siery
    siery almost 2 years

    I want to check whether an argument to a shell script is a whole number (i.e., a non-negative integer: 0, 1, 2, 3, …, 17, …, 42, …, etc, but not 3.1416 or −5) expressed in decimal (so nothing like 0x11 or 0x2A).  How can I write a case statement using regex as condition (to match numbers)? I tried a few different ways I came up with (e.g., [0-9]+ or ^[0-9][0-9]*$); none of them works. Like in the following example, valid numbers are falling through the numeric regex that's intended to catch them and are matching the * wildcard.

    i=1
    let arg_n=$#+1
    
    while (( $i < $arg_n )); do
        case ${!i} in
        [0-9]+)
            n=${!i}
            ;;
        *)
            echo 'Invalid argument!'
            ;;
        esac
        let i=$i+1
    done
    

    Output:

    $ ./cmd.sh 64
    Invalid argument!
    
    • siery
      siery over 6 years
      This variable indirection works just fine. I have more cases in the real script and it works. I'm trying to match any occurrence of real numbers in the program arguments. So 0 or 999 should match. Else if there is some invalid argument like '-x' or letters in stead of numbers, program shall match *, at least thats what I thought.
    • ilkkachu
      ilkkachu over 6 years
      @John1024, numbers aren't valid names for variables, but they're quite valid for the names of the positional parameters, and ${!i} works fine for those. e.g. set -- aa bb cc; i=2; echo ${!i} prints bb
    • ilkkachu
      ilkkachu over 6 years
      that said, the easier way to loop over the arguments to the script would be to just use for val in "$@"; do ... and use $val in the loop
    • Gilles 'SO- stop being evil'
      Gilles 'SO- stop being evil' over 6 years
    • ilkkachu
      ilkkachu over 6 years
      @John1024, and when it's run, i contains 1, so ${!i} is the same as $1: it expands to the value of the first argument, be it 64 or abc or whatever. What they have is just a convoluted way of looping over the positional parameters / command line arguments.
    • John1024
      John1024 over 6 years
      @ilkkachu Very good. My bad.
    • Stéphane Chazelas
      Stéphane Chazelas over 6 years
      The syntax to loop over the positional parameters is for i do something with "$i"; done
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' about 5 years
    @StéphaneChazelas: But isn’t there a risk that [[:digit:]] will also match more than [0123456789], like Eastern Arabic / Hindi digits (٠, ١, ٢, ٣, ٤, ٥, ٦, ٧, ٨, and ٩), Japanese (Kanji) digits (e.g., 零 / 〇, 一, 二, 三, etc.), N’Ko digits (߀, ߁, ߂, ߃, etc.), and others that I haven’t even heard of?
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' about 5 years
    Fun fact: I composed the above comment in Microsoft Word, where I listed the Hindi and N’Ko digits in ascending (LTR) order, but when I pasted them into Internet Explorer, they switched into RTL order, even though I had LTR commas and spaces between them.   Is text direction ignored for punctuation?
  • Stéphane Chazelas
    Stéphane Chazelas about 5 years
    [[:digit:]] is more or less required to match on 0123456789 only as it's meant to match what C isdigit() matches and that's 0123456789 only. See austingroupbugs.net/view.php?id=1078. In practice, I've not come across standard utilities whose [[:digit:]] matches anything else, but I'm see many where [0-9] matches hundreds of characters (including some Eastern Arabic ones (0-8 generally)).
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' about 5 years
    The more I look at this, the more my head hurts. The POSIX specification for isdigit() says “The isdigit() and isdigit_l() functions shall test whether c is a character of class digit in the current locale, …”. I just don’t grok the point in having locales if ٠, ١, ٢, ٣, ٤, ٥, ٦, ٧, ٨, and ٩ aren’t going to be treated as digits in an Eastern Arabic / Hindi locale.
  • Pysis
    Pysis about 4 years
    Since I could not use regexes in a case statement, and similarly advanced pattern syntax for character repetition like multiple digits for number arguments, I just spelled out the cases more explicitly with lots of pipe characters used, and also a backslash character at the end of line for formatting purposes.