IF statement to return true if a word contain a specific letter
Solution 1
What about using a switch
or case
like so:
#!/bin/sh
v="information"
case $v in
*f*)
echo "found \"f\" in ${v}";;
*)
echo "no match found in ${v}"
esac
exit
Note that if the needle is stored in a variable, it's important to quote it so it's not taken as a pattern:
case $haystack in
*"$needle"*) echo match
esac
Without it, if $needle
was *
or ?
for instance, that would match on any haystack (and non-empty haystack respectively).
In any case, $needle
doesn't have to be a single character. It will work with any string.
In many shells it would even work for any sequence of non-null bytes even if they don't form valid characters, but not all would break apart characters. For instance a 0xc3 byte may not be found that way in the é
string encoded in UTF-8 (0xc3 0xa9) in some implementations. Conversely, some shells may find i
inside ξ
when the locale's encoding is BIG5-HKSCS where ξ
is encoded as 0xa3 0x69 (and i
is 0x69 like in ASCII).
Solution 2
Bash's [[ ... ]]
test knows about pattern matches and regexes:
When the
==
and!=
operators are used, the string to the right of the operator is used as a pattern and pattern matching is performed. When the=~
operator is used, the string to the right of the operator is matched as a regular expression.
So:
s=information
if [[ $s = *i* ]] ; then echo has i ; fi
Quoted strings are taken literally:
if [[ $s = "*i*" ]] ; then echo is i between two asterisks ; fi
And it knows about regexes
if [[ $s =~ ^.*i.*$ ]] ; then echo has i ; fi
Though as usual, that takes also matches that don't fill the whole string:
if [[ $s =~ i ]] ; then echo has i ; fi
Solution 3
In the [[ ... ]]
conditions, the right hand side of a comparison works as a pattern.
if [[ $var == *i* ]] ; then
Solution 4
An old (and quite portable) way to do it is using a case statement:
var="information"
case $var in
*i*) echo "An 'i' was found in $var";;
* ) echo "There is no 'i' in $var";;
esac
As a one line function:
a="information" b="i"
one(){ case $a in (*${b}*) true;; (*) false;; esac; }
And used with:
if one; then
echo "Found %b in $a";
else
echo "The character '$b' was not found in the string '$a'"
fi
Other valid ways to perform the same test are:
two(){ [[ $a == *"$b"* ]] ; } # Using a pattern match.
t33(){ [[ $a =~ "$b" ]] ; } # Extended Regex (ERE) match.
f44(){ [[ $a =~ ^.*"$b".*$ ]] ; } # Using a ERE with limits.
f55(){ [[ ${a//[!"${b}"]} ]] ; } # Removing all non-matching chars.
six(){ [ ! "$a" = "${a%"$b"*}" ] ; } # Using char removal.
s77(){ [[ $a =~ ^.*$ ]] ; } # Testing if string is valid.
All functions work with valid strings.
The timing of each function for strings of 10, 100, 1000, …, 1000000 (1Million) characters is below:
Number of characters in the string.
10 100 1000 10000 100000 1000000
one 0.024m 0.036m 0.047m 0.207m 2.117m 25.363m
two 0.028m 0.030m 0.043m 0.179m 2.081m 25.337m
t33 0.044m 0.041m 0.053m 0.151m 1.757m 22.695m
f44 0.064m 0.075m 0.241m 1.864m 19.489m 198.488m
f55 0.055m 0.182m 5.275m 421.886m
six 0.043m 0.057m 0.297m 13.987m
s77 0.056m 0.061m 0.154m 1.201m 12.749m 134.774m
The number of characters is built by repeating a character.
The string to be tested is built with something similar to:
a="$1$(repeat "$2" 10**$k)$3"
The script is called as:
$ ./script start a ending
The function f55
becomes very slow if the size of the string processed gets longer than (around) 1000 characters. The same happens to function six
for strings longer than (around) 10000 (10k) characters.
Function two
is the faster for short strings and t33
(regex) is the best for longer strings.
Functions t33 to s77 change running times if run as:
$ LANG=C ./script
All become faster.
It is interesting to note that functions f44
and s77
will report the error *output false) if the string tested is an invalid utf-8 string, like:
$'\x80abcde'
Exactly as grep (the base command for regex) does (in a utf-8 locale):
$ echo $'\x80abcde' | grep '^.*$' # no output
$ (LANG=C; echo $'\x80abcde' | grep '^.*$')
�abcde
Related videos on Youtube
antiks
Updated on September 18, 2022Comments
-
antiks over 1 year
I need an if statement to return true if a word contain a specific letter. For example:
var="information" if [ $var contain "i" ]; then .... else ... fi
-
Stéphane Chazelas almost 7 yearsNote the difference between
a=$'\x80iii' bash -c '[[ $a =~ i ]]' && echo yes
anda=$'\x80iii' bash -c '[[ $a =~ ^.*i.*$ ]]' && echo yes
in a UTF-8 locale though. -
ilkkachu almost 7 yearsBut
[[ $a = *i* ]]
matches even the invalid byte sequence, interesting. (and it's not even a bytewise match, since multi-byte characters work) -
Stéphane Chazelas almost 7 yearsYes,
*
in most shells match any sequence of bytes (and won't cut characters in the middle with those that support multibyte characters) and?
would match a byte that is not part of a valid character. That's against what POSIX currently specifies but makes for generally safer code (future versions of POSIX should be amended). There are manyfnmatch()
implementations that don't give that guarantee though (which is why GNUfind
's-name '*'
won't match file names that contain bytes not making part of valid characters). -
Stéphane Chazelas almost 7 yearsWhile the wildcard pattern matching is done internally by
bash
, the regexp matching is done via the system's regexp API. So the performance (in addition to supported RE syntax) will vary with the OS and OS version.