Matching numbers with regex in case statement
Solution 1
case
does not use regexes, it uses patterns
For "1 or more digits", do this:
shopt -s extglob
...
case ${!i} in
+([[:digit:]]) )
n=${!i}
;;
...
If you want to use regular expressions, use the =~
operator within [[...]]
if [[ ${!i} =~ ^[[:digit:]]+$ ]]; then
n=${!i}
else
echo "Invalid"
fi
Solution 2
As glenn says, “case
does not use regexes, it uses patterns”.
As bash(1) says,
case word in [ [(] pattern [ | pattern ] ... ) list ;; ] ... esac
A
case
command first expandsword
, and tries to match it against eachpattern
in turn, using the same matching rules as for pathname expansion (see Pathname Expansion below).
Similarly, the POSIX specification says,
… each pattern … shall be compared against the expansion of word, according to the rules described in Pattern Matching Notation …
So the patterns are pathname expansion patterns,
a.k.a. wildcards, a.k.a. globs, as in ls -l -- *.sh
or rm -- *.bak
.
Sure, shopt -s extglob
and [[ … =~ … ]]
are the neatest thing since sliced bread,
but they aren’t POSIX,
and it can be useful to know how to use the original tools.
For years, programmers checked, for example,
whether a string was a number
by checking whether it was not not a number.
You’ve defined a number to be a string that consists
(entirely) of one or more digits.
So a string is not a number if it is null,
or if it contains a character that is not a digit.
We can test these conditions with a case
statement as follows:
case "$1" in
("")
# null
︙
;;
(*[!0-9]*)
# contains non-numeric character(s)
︙
;;
(*)
# is a whole number (non-negative integer)
︙
esac
where [!0-9]
is the old-timey shell way of saying [^0-9]
,
which, of course, means any character other than a digit.
([!…]
and [^…]
both work in bash.
[!…]
is required to work by POSIX; the result of [^…]
is unspecified.)
If you don’t care which kind of non-number a string is,
you can combine the non-number patterns:
case "$1" in
("" | *[!0-9]*)
# not a number
︙
;;
(*)
# is a number
︙
esac
As an exercise,
here’s a case
statement to handle any kind of real number —
to be precise, a string of one or more digits,
with optionally a period (.
) somewhere,
and optionally a minus sign (-
) at the beginning.
case "$1" in
(*[!-.0-9]*)
# contains non-numeric character(s)
;;
(*?-*)
# contains '-' somewhere other than the first position
;;
(*.*.*)
# contains multiple decimal points
;;
(*)
case "$1" in
(*[0-9]*)
# is a real number
;;
(*)
# not a number
esac
esac
I added the case
-within-a-case
to verify that the string does, indeed,
contain at least one digit.
That wasn’t necessary in the integer example
because I tested whether the string was null;
a test which I have removed from this statement.
Without the second case
, a single -
or a single .
—
or even -.
— would qualify as a number.
Of course we could add patterns to handle those exceptions,
but that can get complex.
(For example, I almost posted this answer
without realizing that -.
was one of the exceptions.)
I believe that the above approach is more flexible and robust.
Of course the non-number patterns can be combined here, too:
(*[!-.0-9]* | *?-* | *.*.*)
.
Solution 3
To match numbers with regexp in case
statements, you'd need a shell whose wildcards support regexps. I only know of ksh93 with those.
With ksh93 globs, you can do ~(E)^[0-9]+$
or ~(E:^[0-9]+$)
to use an E
xtended regexp in a glob pattern, or ~(P)^\d+$
to use a perl-like regexp (also G
for basic regexp, X
for augmented regexp, V
for SysV regexp).
So:
#! /bin/ksh93
for i do
case $i in
(~(E)^[0-9]+$)
n=$i;;
(*)
echo >&2 'Invalid argument!'
usage
esac
done
Related videos on Youtube
![siery](https://i.stack.imgur.com/yaugK.png?s=256&g=1)
siery
Updated on September 18, 2022Comments
-
siery almost 2 years
I want to check whether an argument to a shell script is a whole number (i.e., a non-negative integer: 0, 1, 2, 3, …, 17, …, 42, …, etc, but not 3.1416 or −5) expressed in decimal (so nothing like 0x11 or 0x2A). How can I write a case statement using regex as condition (to match numbers)? I tried a few different ways I came up with (e.g.,
[0-9]+
or^[0-9][0-9]*$
); none of them works. Like in the following example, valid numbers are falling through the numeric regex that's intended to catch them and are matching the*
wildcard.i=1 let arg_n=$#+1 while (( $i < $arg_n )); do case ${!i} in [0-9]+) n=${!i} ;; *) echo 'Invalid argument!' ;; esac let i=$i+1 done
Output:
$ ./cmd.sh 64 Invalid argument!
-
siery over 6 yearsThis variable indirection works just fine. I have more cases in the real script and it works. I'm trying to match any occurrence of real numbers in the program arguments. So 0 or 999 should match. Else if there is some invalid argument like '-x' or letters in stead of numbers, program shall match
*
, at least thats what I thought. -
ilkkachu over 6 years@John1024, numbers aren't valid names for variables, but they're quite valid for the names of the positional parameters, and
${!i}
works fine for those. e.g.set -- aa bb cc; i=2; echo ${!i}
printsbb
-
ilkkachu over 6 yearsthat said, the easier way to loop over the arguments to the script would be to just use
for val in "$@"; do ...
and use$val
in the loop -
Gilles 'SO- stop being evil' over 6 years
-
ilkkachu over 6 years@John1024, and when it's run,
i
contains1
, so${!i}
is the same as$1
: it expands to the value of the first argument, be it64
orabc
or whatever. What they have is just a convoluted way of looping over the positional parameters / command line arguments. -
John1024 over 6 years@ilkkachu Very good. My bad.
-
Stéphane Chazelas over 6 yearsThe syntax to loop over the positional parameters is
for i do something with "$i"; done
-
-
G-Man Says 'Reinstate Monica' about 5 years@StéphaneChazelas: But isn’t there a risk that
[[:digit:]]
will also match more than[0123456789]
, like Eastern Arabic / Hindi digits (٠, ١, ٢, ٣, ٤, ٥, ٦, ٧, ٨, and ٩), Japanese (Kanji) digits (e.g., 零 / 〇, 一, 二, 三, etc.), N’Ko digits (߀, ߁, ߂, ߃, etc.), and others that I haven’t even heard of? -
G-Man Says 'Reinstate Monica' about 5 yearsFun fact: I composed the above comment in Microsoft Word, where I listed the Hindi and N’Ko digits in ascending (LTR) order, but when I pasted them into Internet Explorer, they switched into RTL order, even though I had LTR commas and spaces between them. Is text direction ignored for punctuation?
-
Stéphane Chazelas about 5 years
[[:digit:]]
is more or less required to match on 0123456789 only as it's meant to match what Cisdigit()
matches and that's 0123456789 only. See austingroupbugs.net/view.php?id=1078. In practice, I've not come across standard utilities whose[[:digit:]]
matches anything else, but I'm see many where[0-9]
matches hundreds of characters (including some Eastern Arabic ones (0-8 generally)). -
G-Man Says 'Reinstate Monica' about 5 yearsThe more I look at this, the more my head hurts. The POSIX specification for
isdigit()
says “The isdigit() and isdigit_l() functions shall test whether c is a character of class digit in the current locale, …”. I just don’t grok the point in having locales if ٠, ١, ٢, ٣, ٤, ٥, ٦, ٧, ٨, and ٩ aren’t going to be treated as digits in an Eastern Arabic / Hindi locale. -
Pysis about 4 yearsSince I could not use regexes in a case statement, and similarly advanced pattern syntax for character repetition like multiple digits for number arguments, I just spelled out the cases more explicitly with lots of pipe characters used, and also a backslash character at the end of line for formatting purposes.