Count the number of occurrences of a substring in a string

11,180

Solution 1

With perl:

printf '%s' "$SUB_STRING" |
  perl -l -0777 -ne '
    BEGIN{$sub = <STDIN>}
    @matches = m/\Q$sub\E/g;
    print scalar @matches' <(printf '%s' "$STRING")

With bash alone, you could always do something like:

s=${STRING//"$SUB_STRING"}
echo "$(((${#STRING} - ${#s}) / ${#SUB_STRING}))"

That is $s contains $STRING with all occurrences of $SUB_STRING within it removed. We find out the number of $SUB_STRINGs that were removed by computing the difference in number of characters in between $STRING and $s and dividing by the length of $SUB_STRING itself.

POSIXly, you could do something like:

s=$STRING count=0
until
  t=${s#*"$SUB_STRING"}
  [ "$t" = "$s" ]
do
  count=$((count + 1))
  s=$t
done
echo "$count"

Solution 2

Using string processing functions we could do it with Perl as follows :

 printf '%s\n' "$STRING" |
 perl -nse '
      $_ .= join "", <>;
      $k++ while ++($p = index($_, $s, $p));
      print $k, "\n" ;
 '    --     -s="$SUB_STRING" 

Explanation:

° load up the whole string in $_

°  index function will return the position of a substring in a string OTW returns -1

° progressively match the substring and use the position found as the starting position for the next search. 

°  all this while increment the counter $k depicting substring found. 

Some other methods are listed below:

Slurp the string and use regex.

printf '%s\n' "$STRING" |
perl -slp -0777e '
        $_ = () = /$s/g;
 ' -- -s="$s" 

° Slurp string into the $_ variable.

° pass the substring from the command line to perl using the -s option.

° now perform a match on $_ and in a list context gets you the matches which are then take in scalar context to get the number of matches.

° the -p option shall autoprints what's in $_.

Method using the sed tool :

 esc_s=$(printf '%s\n' "$SUB_STRING" |\
 sed -e 's:[][\/.^$*]:\\&:g' -e 'H;1h;$!d;g;s/\n/\\n/g')

 printf '%s\n' "$STRING" |
 sed -ne '
         $!{N;s/^/\n/;D;}
         /'"$esc_s"'/{
               x;p;x
               s///;s/^/\n/;D
         }
 ' | wc -l

° As a preparatory step, we go ahead and escape all characters acting as meta characters to the left hand side of s/// statement in the substring which if not done will cause the sed to crash.

° Now we slurp the whole of string into the pattern space.

° then we keep printing an empty line, the hold space is a good candidate, and take away the substring from the pattern space.

° rinse... lather... repeat so long as substring is present.

° the empty lines are then piped to the wc tool which will get us the count of lines = number of times substring was found.

This is shell version:

 e=$STRING  N=0
 while 
     e=$(expr " $e" : " \(.*\)$SUB_STRING")
     case $e in "" ) break ;; esac
  do
           N=$(expr "$N" + 1)
  done
  echo "$N"

Solution 3

If the substring contains no line breaks:

echo -n STRING | grep -Fo SUBSTRING | wc -l

Solution 4

You can use Python like in this question

python -c 'print "abcdabcva".count("ab")'

Or if you are working with shell variables:

python -c 'print("""'"$STRING"'""".count("""'"$SUB_STRING"'"""))'

In your case:

python -c 'print """0: asus-wlan: Wireless LAN
                   Soft blocked: no
                   Hard blocked: no
          1: asus-bluetooth: Bluetooth
                   Soft blocked: no
                   Hard blocked: no
          2: phy0: Wireless LAN
                   Soft blocked: no
                   Hard blocked: no
          113: hci0: Bluetooth
                   Soft blocked: no
                   Hard blocked: no""".count("""Bluetooth
                   Soft blocked: no
                   Hard blocked: no""")'

Solution 5

gawk '
END { print NR - 1 }
' RS='Bluetooth
         Soft blocked: no
         Hard blocked: no' input.txt

Explanation

RS - the input record separator, by default a newline. Set it to the required string and awk will split all text to records, using this string as separator. Then, it remains only to print the number of records subtracted by 1 in the END section.

Using variables:

#!/bin/bash

STRING='0: asus-wlan: Wireless LAN
         Soft blocked: no
         Hard blocked: no
1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
2: phy0: Wireless LAN
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no'

SUB_STRING='Bluetooth
         Soft blocked: no
         Hard blocked: no'

gawk 'END { print NR - 1 }' RS="$SUB_STRING" <<< "$STRING"
Share:
11,180

Related videos on Youtube

Eduardo Lucio
Author by

Eduardo Lucio

Free software enthusiast. I love working with hardware virtualization mainly Xen (hypervisor) and with a free source general programming purpose editor/IDE called Vim. I working with software development (open source) and as an IT manager. I have a daughter and live in Brazil (Brasília).

Updated on September 18, 2022

Comments

  • Eduardo Lucio
    Eduardo Lucio over 1 year

    How can I count the number of occurrences of a substring in a string using Bash?

    EXAMPLE:

    I'd like to know how many times this substring :

    Bluetooth
             Soft blocked: no
             Hard blocked: no
    

    ...occurs in this string...

    0: asus-wlan: Wireless LAN
             Soft blocked: no
             Hard blocked: no
    1: asus-bluetooth: Bluetooth
             Soft blocked: no
             Hard blocked: no
    2: phy0: Wireless LAN
             Soft blocked: no
             Hard blocked: no
    113: hci0: Bluetooth
             Soft blocked: no
             Hard blocked: no
    

    NOTE I: I have tried several approaches with sed, grep, awk... Nothing seems to work when we have strings with spaces and multiple lines.

    NOTE II: I'm a Linux user and I'm trying a solution that does not involve installing applications/tools outside those that are usually found in Linux distributions.


    IMPORTANT:

    I would like something like the hypothetical example below. In this case we use two Shell variables (Bash).

    EXAMPLE:

    STRING="0: asus-wlan: Wireless LAN
             Soft blocked: no
             Hard blocked: no
    1: asus-bluetooth: Bluetooth
             Soft blocked: no
             Hard blocked: no
    2: phy0: Wireless LAN
             Soft blocked: no
             Hard blocked: no
    113: hci0: Bluetooth
             Soft blocked: no
             Hard blocked: no"
    
    SUB_STRING="Bluetooth
             Soft blocked: no
             Hard blocked: no"
    
    awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' "$STRING" "$SUB_STRING"
    

    NOTE: We are using awk just to illustrate!

  • Eduardo Lucio
    Eduardo Lucio about 6 years
    The pure bash approach works perfectly apart from being fully portable to various Linux distributions including Docker containers and so this was the best answer so far. =D
  • Eduardo Lucio
    Eduardo Lucio about 6 years
    Will look great If you can do it in a way that does not need to escape the line breaks. An approach with python has great value! =D
  • James DreeaMz
    James DreeaMz about 6 years
    @EduardoLucio challenge accepted and answer edited :)
  • Eduardo Lucio
    Eduardo Lucio about 6 years
    The following example worked perfectly for me (Python 3): python -c 'print("""'"$STRING"'""".count("""'"$SUB_STRING"'"""))'. Note that in this example we pass the bash/shell variables "directly" to the python command. I think this template is compatible with Python 2 and 3. My suggestion is that you update your answer with this "improvement".
  • Eduardo Lucio
    Eduardo Lucio about 6 years
    I'd like to be able to use Shell variables (see in my question) with your approach if it is possible. Could you modify it to work that way? Thanks a lot! =D
  • Eduardo Lucio
    Eduardo Lucio about 6 years
    If your string is (example) " Soft blocked: no" or "Soft blocked: no" with "\s+Soft blocked: no" match result becomes indifferent to the spaces of the original string entered. We need the count to be made by the exact substring entered. Thanks!
  • JJoao
    JJoao about 6 years
    If you want the exact substring replace the \s+ flexible spaces by strict spaces ...\n Soft blocked: no...
  • Eduardo Lucio
    Eduardo Lucio about 6 years
    I did the test with the following command echo -n "$STRING" | grep -zPio 'Bluetooth\n Soft blocked: no\n Hard blocked: no' | grep -zc .. However, the match remains indifferent to the number of spaces. Thanks! =D (NOTE: The stackexchange editor is removing the spaces I've placed!)
  • MiniMax
    MiniMax about 6 years
    @EduardoLucio See update. I added version with variables usage.
  • presto8
    presto8 about 3 years
    Great solution. Just one small tweak, use grep -Fo: echo $STRING | grep -Fo SUBSTRING | wc -l
  • Sapphire_Brick
    Sapphire_Brick about 3 years
    @presto8 I previously removed the o flag, erroneously thinking it was unnecessary. Thanks for catching that.