Case-insensitive substring search in a shell script

66,464

Solution 1

First here's a simple example script that doesn't ignore case:

#!/bin/bash
if [ $(echo hello) == hello ]; then
    echo it works
fi

Try changing the string hello on the right, and it should no longer echo it works. Try replacing echo hello with a command of your choosing. If you want to ignore case, and neither string contains a line break, then you could use grep:

#!/bin/bash
if echo Hello | grep -iqF hello; then
    echo it works
fi

The key here is that you are piping a command output to grep. The if statement tests the exit status of the rightmost command in a pipeline - in this case grep. Grep exits with success if and only if it finds a match.

The -i option of grep says to ignore case.
The -q option says to not emit output and exit after the first match.
The -F option says to treat the argument as a string rather than a regular expression.

Note that the first example uses [ expression ] which allows direct comparisons and various useful operators. The second form just execs commands and tests their exit status.

Solution 2

You can do case-insensitive substring matching natively in bash using the regex operator =~ if you set the nocasematch shell option. For example

s1="hElLo WoRlD"
s2="LO"

shopt -s nocasematch

[[ $s1 =~ $s2 ]] && echo "match" || echo "no match"
match

s1="gOoDbYe WoRlD"
[[ $s1 =~ $s2 ]] && echo "match" || echo "no match"
no match

Solution 3

For a case-sensitive string search of the value of the variable needle in the value of the variable haystack:

case "$haystack" in
  *"$needle"*) echo "present";
  *) echo "absent";
esac

For a case-insensitive string search, convert both to the same case.

uc_needle=$(printf %s "$needle" | tr '[:lower:]' '[:upper:]' ; echo .); uc_needle=${uc_needle%.}
uc_haystack=$(printf %s "$haystack" | tr '[:lower:]' '[:upper:]' ; echo .); uc_haystack=${uc_haystack%.}
case "$uc_haystack" in
  *"$uc_needle"*) echo "present";;
  *) echo "absent";;
esac

Note that the tr in GNU coreutils doesn't support multibyte locales (e.g. UTF-8). To make work with multibyte locales, use awk instead. If you're going to use awk, you can make it do the string comparison and not just the conversion.

if awk 'BEGIN {exit !index(toupper(ARGV[2]), toupper(ARGV[1]))}' "$needle" "$haystack"; then
  echo "present"
else
  echo "absent"
fi

The tr from BusyBox doesn't support the [:CLASS:] syntax; you can use tr a-z A-Z instead. BusyBox doesn't support non-ASCII locales.

In bash (but not sh), version 4.0+, there is a built-in syntax for case conversion, and a simpler syntax for string matching.

if [[ "${haystack^^}" = *"${needle^^}"* ]]; then
  echo "present"
else
  echo "absent"
esac
Share:
66,464

Related videos on Youtube

JM S. Tubiera
Author by

JM S. Tubiera

Updated on September 18, 2022

Comments

  • JM S. Tubiera
    JM S. Tubiera almost 2 years

    How can I write a shell script that will do a case-insensitive substring match of command output?

    • Ramesh
      Ramesh about 10 years
      grep -i may be?
    • JM S. Tubiera
      JM S. Tubiera about 10 years
      How will I put that inside my script? I'm sorry if this is a novice questions. I'm just starting to study Linux because I need it for my internship. Thanks!
    • JM S. Tubiera
      JM S. Tubiera about 10 years
      *question. Sorry for the grammatical error.
    • goldilocks
      goldilocks about 10 years
      What you're asking about is shell scripting -- "linux" is not a programming language, it's an operating system kernel. The shell most commonly used with linux is bash, which is a superset of the unix standard sh. You might start by looking at one of these: |1| |2| -- just to get a grip on what the actual context is.
    • JM S. Tubiera
      JM S. Tubiera about 10 years
      My bad. I was confused between Linux and bash. Thanks for the links!
    • BobDoolittle
      BobDoolittle about 10 years
      Is there some reason this question is still on hold?
    • BobDoolittle
      BobDoolittle about 10 years
      This question now seems quite clear and matches the guidelines in the help center. Can it please be opened for the benefit of others?
    • JM S. Tubiera
      JM S. Tubiera about 10 years
      I don't see the fuzz why this question is not clear. What should I add for it to be clear?
  • BobDoolittle
    BobDoolittle about 10 years
    I don't understand why Gilles felt it was necessary to change the code I contributed. He didn't break anything but it worked just fine. You don't need the double quotes in this example - they are important if the output contains spaces however. And == works just as well as = because sh is actually bash on Linux. The original Bourne Shell is long gone at this point in time. I don't think even Solaris ships it any more. While unnecessary in this example I agree that double quotes are probably a best practice, but so is '==' in my opinion, to keep assignment and comparison clearly separate.
  • JM S. Tubiera
    JM S. Tubiera about 10 years
    Wait, so one can edit a post? I did not know that.
  • BobDoolittle
    BobDoolittle about 10 years
    With sufficient reputation, yes. I would hope somebody with high reputation would think twice before making needless edits however, particularly to code in this forum. unix.stackexchange.com/help/privileges
  • BobDoolittle
    BobDoolittle about 10 years
    lol! points for obscure shell knowledge.
  • Admin
    Admin about 10 years
    @BobDoolittle It may be in certain cases it makes a difference but not with your setup - it's good to know.
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    Note that == is non-standard and non-portable. Use = instead. grep -q should be preferred over redirecting to /dev/null (unless you don't want the command to be killed by a SIGPIPE) as then grep stops processing as soon as it finds a match (imagine the case of a command outputting gigabytes and hello being found on the first line). The -q option is standard. 20 years ago, it would not have been portable, but now it's universal.
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    Note that in practice, it's not only about the Bourne shell. == is not POSIX. sh is not bash on all Linux based systems. == is not supported by ash (upon which the sh of many BSDs and Debian derivatives at least is based), or posh, and needs quoted in zsh. There's no point doubling the =. [ is a command for testing. There's no need to disambiguate between assignment and comparison here. That's different in (( a == b )) vs (( a = b)). Using == in a script that starts with #! /bin/sh is wrong. If you assume ksh or bash syntax, update the #! accordingly.
  • BobDoolittle
    BobDoolittle about 10 years
    Very helpful to know, thanks. Post updated to reflect bash. Since bash accepts == I believe it makes for more readable code so will leave as-is.
  • Will
    Will over 7 years
    I realize this is a couple years old, but all that printf | tr makes my head spin around. Where possible, keep your invocation of commands to a minimal ... given a variable v, you can accomplish the same thing using v=$(tr '[:lower:]' '[:upper:]' <<<$v). For those who have never seen it before, the <<< is essentially a "here variable" like the use of <<EOF is for a here document. Don't printf or echo unless you absolutely have to do so.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 7 years
    @Will That only works in shells that have the <<< operator: ksh, bash, zsh, but not plain sh. And it's pretty close to piping from printf in terms of how it runs: there's the same number of calls to fork and execve (assuming that printf is built-in, which is the case on most common shells); a difference is that <<< creates a temporary file rather than using a pipe. <<< is convenient to type but not a performance improvement.
  • itsadok
    itsadok over 6 years
    This option also affects the simple match operator. [[ XYZ == xyz ]] && echo "match" => match