IF statement to return true if a word contain a specific letter

25,764

Solution 1

What about using a switch or case like so:

#!/bin/sh

v="information"
case $v in
    *f*)
        echo "found \"f\" in ${v}";;
    *)
        echo "no match found in ${v}"
esac
exit

Note that if the needle is stored in a variable, it's important to quote it so it's not taken as a pattern:

case $haystack in
  *"$needle"*) echo match
esac

Without it, if $needle was * or ? for instance, that would match on any haystack (and non-empty haystack respectively).

In any case, $needle doesn't have to be a single character. It will work with any string.

In many shells it would even work for any sequence of non-null bytes even if they don't form valid characters, but not all would break apart characters. For instance a 0xc3 byte may not be found that way in the é string encoded in UTF-8 (0xc3 0xa9) in some implementations. Conversely, some shells may find i inside ξ when the locale's encoding is BIG5-HKSCS where ξ is encoded as 0xa3 0x69 (and i is 0x69 like in ASCII).

Solution 2

Bash's [[ ... ]] test knows about pattern matches and regexes:

When the == and != operators are used, the string to the right of the operator is used as a pattern and pattern matching is performed. When the =~ operator is used, the string to the right of the operator is matched as a regular expression.

So:

s=information
if [[ $s = *i* ]] ; then echo has i ; fi

Quoted strings are taken literally:

if [[ $s = "*i*" ]] ; then echo is i between two asterisks ; fi

And it knows about regexes

if [[ $s =~ ^.*i.*$ ]] ; then echo has i ; fi

Though as usual, that takes also matches that don't fill the whole string:

if [[ $s =~ i ]] ; then echo has i ; fi

Solution 3

In the [[ ... ]] conditions, the right hand side of a comparison works as a pattern.

if [[ $var == *i* ]] ; then

Solution 4

An old (and quite portable) way to do it is using a case statement:

var="information"
case $var in
     *i*) echo "An 'i' was found in $var";;
     * )  echo "There is no 'i' in $var";;
esac

As a one line function:

a="information"   b="i"

one(){ case $a in (*${b}*) true;; (*) false;; esac; }

And used with:

if one; then 
    echo "Found %b in $a"; 
else
    echo "The character '$b' was not found in the string '$a'"
fi

Other valid ways to perform the same test are:

two(){ [[  $a ==   *"$b"*      ]] ; }   # Using a pattern match.
t33(){ [[  $a =~    "$b"       ]] ; }   # Extended Regex (ERE) match.
f44(){ [[  $a =~ ^.*"$b".*$    ]] ; }   # Using a ERE with limits.
f55(){ [[  ${a//[!"${b}"]}     ]] ; }   # Removing all non-matching chars.
six(){ [ ! "$a" = "${a%"$b"*}"  ] ; }   # Using char removal. 
s77(){ [[  $a =~ ^.*$          ]] ; }   # Testing if string is valid.

All functions work with valid strings.
The timing of each function for strings of 10, 100, 1000, …, 1000000 (1Million) characters is below:

        Number of characters in the string.
        10     100    1000    10000   100000  1000000
one  0.024m  0.036m  0.047m   0.207m   2.117m  25.363m
two  0.028m  0.030m  0.043m   0.179m   2.081m  25.337m
t33  0.044m  0.041m  0.053m   0.151m   1.757m  22.695m
f44  0.064m  0.075m  0.241m   1.864m  19.489m 198.488m
f55  0.055m  0.182m  5.275m 421.886m
six  0.043m  0.057m  0.297m  13.987m
s77  0.056m  0.061m  0.154m   1.201m  12.749m 134.774m

The number of characters is built by repeating a character.
The string to be tested is built with something similar to:

a="$1$(repeat "$2" 10**$k)$3"

The script is called as:

$ ./script start a ending 

The function f55 becomes very slow if the size of the string processed gets longer than (around) 1000 characters. The same happens to function six for strings longer than (around) 10000 (10k) characters.

Function two is the faster for short strings and t33 (regex) is the best for longer strings.

Functions t33 to s77 change running times if run as:

$ LANG=C ./script

All become faster.

It is interesting to note that functions f44 and s77 will report the error *output false) if the string tested is an invalid utf-8 string, like:

$'\x80abcde'

Exactly as grep (the base command for regex) does (in a utf-8 locale):

$ echo $'\x80abcde' | grep '^.*$'       # no output

$ (LANG=C; echo $'\x80abcde' | grep '^.*$') 
�abcde
Share:
25,764

Related videos on Youtube

antiks
Author by

antiks

Updated on September 18, 2022

Comments

  • antiks
    antiks over 1 year

    I need an if statement to return true if a word contain a specific letter. For example:

    var="information"
    if [ $var contain "i" ]; then
    ....
    else
    ...
    fi
    
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    Note the difference between a=$'\x80iii' bash -c '[[ $a =~ i ]]' && echo yes and a=$'\x80iii' bash -c '[[ $a =~ ^.*i.*$ ]]' && echo yes in a UTF-8 locale though.
  • ilkkachu
    ilkkachu almost 7 years
    But [[ $a = *i* ]] matches even the invalid byte sequence, interesting. (and it's not even a bytewise match, since multi-byte characters work)
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    Yes, * in most shells match any sequence of bytes (and won't cut characters in the middle with those that support multibyte characters) and ? would match a byte that is not part of a valid character. That's against what POSIX currently specifies but makes for generally safer code (future versions of POSIX should be amended). There are many fnmatch() implementations that don't give that guarantee though (which is why GNU find's -name '*' won't match file names that contain bytes not making part of valid characters).
  • Stéphane Chazelas
    Stéphane Chazelas almost 7 years
    While the wildcard pattern matching is done internally by bash, the regexp matching is done via the system's regexp API. So the performance (in addition to supported RE syntax) will vary with the OS and OS version.