Checking a string to see if it contains numeric character in UNIX

48,468

Solution 1

Yet another approach. Grep exits with 0 if a match is found, so you can test the exit code:

echo "${word}" | grep -q '[0-9]'
if [ $? = 0 ]; then
    echo 'Invalid input'
fi

This is /bin/sh compatible.


Incorporating Daenyth and John's suggestions, this becomes

if echo "${word}" | grep '[0-9]' >/dev/null; then
    echo 'Invalid input'
fi

Solution 2

The double bracket operator is an extended version of the test command which supports regexes via the =~ operator:

#!/bin/bash

while true; do
    read -p "Please enter a word: " word
    if [[ $word =~ [0-9] ]]; then
        echo 'Invalid input!' >&2
    else
        break
    fi
done

This is a bash-specific feature. Bash is a newer shell that is not available on all flavors of UNIX--though by "newer" I mean "only recently developed in the post-vacuum tube era" and by "not all flavors of UNIX" I mean relics like old versions of Solaris and HP-UX.

In my opinion this is the simplest option and bash is plenty portable these days, but if being portable to old UNIXes is in fact important then you'll need to use the other posters' sh-compatible answers. sh is the most common and most widely supported shell, but the price you pay for portability is losing things like =~.

Solution 3

If you're trying to write portable shell code, your options for string manipulation are limited. You can use shell globbing patterns (which are a lot less expressive than regexps) in the case construct:

export LC_COLLATE=C
read word
while
  case "$word" in
    *[!A-Za-z]*) echo >&2 "Invalid input, please enter letters only"; true;;
    *) false;;
  esac
do
  read word
done

EDIT: setting LC_COLLATE is necessary because in most non-C locales, character ranges like A-Z don't have the “obvious” meaning. I assume you want only ASCII letters; if you also want letters with diacritics, don't change LC_COLLATE, and replace A-Za-z by [:alpha:] (so the whole pattern becomes *[![:alpha:]]*).

For full regexps, see the expr command. EDIT: Note that expr, like several other basic shell tools, has pitfalls with some special strings; the z characters below prevent $word from being interpreted as reserved words by expr.

export LC_COLLATE=C
read word
while expr "z$word" : 'z[A-Za-z]*$' >/dev/null; then
  echo >&2 "Invalid input, please enter letters only"
  read word
fi

If you only target recent enough versions of bash, there are other options, such as the =~ operator of [[ ... ]] conditional commands.

Note that your last line has a bug, the first command should be

grep -i "$word" "$1"

The quotes are because somewhat counter-intuitively, "$foo" means “the value of the variable called foo” whereas plain $foo means “take the value of foo, split it into separate words where it contains whitespace, and treat each word as a globbing pattern and try to expand it”. (In fact if you've already checked that $word contains only letters, leaving the quotes won't do any harm, but it takes more time to think of these special cases than to just put the quotes every times.)

Solution 4

Yet another (quite) portable way to do it ...

if test "$word" != "`printf "%s" "$word" | tr -dc '[[:alpha:]]'`"; then
   echo invalid
fi
Share:
48,468
electricsheep
Author by

electricsheep

Updated on August 02, 2022

Comments

  • electricsheep
    electricsheep almost 2 years

    I'm new to UNIX, having only started it at work today, but experienced with Java, and have the following code:

    #/bin/bash
    echo "Please enter a word:"
    read word
    grep -i $word $1 | cut -d',' -f1,2 | tr "," "-"> output
    

    This works fine, but what I now need to do is to check when word is read, that it contains nothing but letters and if it has numeric characters in print "Invalid input!" message and ask them to enter it again. I assumed regular expressions with an if statement would be the easy way to do this but I cannot get my head around how to use them in UNIX as I am used to the Java application of them. Any help with this would be greatly appreciated, as I couldn't find help when searching as all the solutions with regular expressions in linux I found only dealt with if it was either all numeric or not.

  • MikeD
    MikeD almost 14 years
    Assuming bash kitty? That's cruel.
  • Daenyth
    Daenyth almost 14 years
    -q to grep is not portable outside of GNU. If you want full portability (The only reason to ever use sh), use >/dev/null 2>&1
  • Daenyth
    Daenyth almost 14 years
    The case you have listed fails for non-ascii non-numeric input.
  • Stephen P
    Stephen P almost 14 years
    @Daenyth you're absolutely right, and I even did that when I tried it out on my system, then added the -q when I posted my answer. Seems I've been GNU-only for too long.
  • John Kugelman
    John Kugelman almost 14 years
    You can tighten this up by changing cmd; if [ $? = 0]; then to if cmd; then.
  • Anders
    Anders almost 14 years
    @John, think you missed "I'm new to UNIX", explicitness before cleverness.
  • Daenyth
    Daenyth almost 14 years
    @Anders it's not about cleverness, it's the same as the difference between doing foo=returnTrue(); if (foo == true) and if (returnTrue())
  • Anders
    Anders almost 14 years
    @Daenyth, Yeah I know, should have used another expression. Readability perhaps would be better. For someone new I feel it would be better to use explicitness and also to display $?
  • Daenyth
    Daenyth almost 14 years
    @Anders: In that case you should comment what $? means, because for someone new to bash, it's completely meaningless :P
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 14 years
    @Daenyth: true, all the solutions that use A-Za-z assume an ASCII locale. So let me add a footnote: if you want to allow all the letters in your locale (including letters with diacritics), replace A-Za-z by [:alpha:] everywhere (case, expr, grep, ...) (yes, you'll have brackets within brackets). If you only want ASCII letters, put export LC_COLLATE=C near the beginning of your script.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 14 years
    @Stephen: The downside with using echo is that some shells treat \` as special, and some shells have special behavior if the first argument to echo` begins with a -. A portable way of avoiding these pitfalls is to use printf "%s\\n" "$word" (or printf %s "$word" if you don't want a newline at the end).
  • Daenyth
    Daenyth almost 14 years
    Simpler solution is just to invert it -- test that it contains [^0-9]. A whitelist is easier than a blacklist.
  • Stephen P
    Stephen P almost 14 years
    @Gilles: I recalled that after 'tom' posted his "(quite) portable" answer and was looking into the actual portability of printf. It didn't exist as man 1 printf when I started shell programming, only as man 3 printf (the C function), and I tend to forget about it. Q: Would printf "${word}" work as well, or is the format specifier needed to prevent the interpretation of the '\' as an escape?
  • Daenyth
    Daenyth almost 14 years
    printf is a bash builtin, so it's not portable.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 14 years
    @Daenyth: It's not a whitelist/blacklist matter: [^0-9] would accept non-alphanumerics, whereas I think locky28 means to reject them. A better objection would have been that I used expr without protecting its arguments (fixed in my latest edit).
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 14 years
    @Stephen: printf is not portable to unix systems from before the mid-1990s or so, but it's in POSIX. And printf "$word" would not work, any more than printf(mystring); would work in C: the first argument is a format, so % is special in it; in the shell an initial -, and a `\` anywhere, are also special.
  • Daenyth
    Daenyth almost 14 years
    I think you misunderstand. I'm saying that you should test that it doesn't match. If any string foo matches [^0-9], it contains invalid characters.
  • Stephen P
    Stephen P almost 14 years
    @Gilles: C printf("some plain string"); with no format specifiers works just fine, as does printf(varThatContainsFormat, var1, var2);, and I tried the shell printf "$somevar" and that's fine too; the problem shows up in the difference between printf "hello\n" and printf "%s" "hello\n" -- the 1st prints hello followed by a newline, the 2nd prints hello\n showing the backslash and not printing a newline. Your advice is definitely sound, thanks.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 14 years
    @Stephen: The point is when your string comes in a variable and you can't know that it doesn't contain escape characters, printf "$word" (or its C equivalent) doesn't do what you want unless you did intend "$word" to be a printf format (with 0 arguments) rather than a string. It's a common mistake from C beginners.