In linux bourne shell: How to count the occurrences of a specific word in a file

13,727

Solution 1

awk '{ for(i=1; i<=NF; i++) if($i=="hello") c++ } END{ print c }' file.txt

If you need it to print every line:

awk '{ c=1; for(i=0; i<=NF; i++) if($i=="hello") c++; print c }'

Solution 2

grep -o '\<hello\>' filename | wc -l

The \< and \> bits are word boundary patterns, so the expression won't find foohello or hellobar.

You can also use awk -F '\\<hello\\>' ... to achieve the same effect.

Solution 3

Solution:

sed 's/\s\+/\n/g' test.txt | grep -w hello  | wc -l

Explanation:

sed 's/\s\+/\n/g' text.txt

This replaces every span of whitespace with a newline, effectively reformatting the file test.txt so it has one word per line. The command sed 's/FIND/REPLACE/g' replaces the FIND pattern with REPLACE everywhere it appears. The pattern \s\+ means "one or more whitespace characters", and \n is a newline.

grep -w hello

This extracts only those lines that contain hello as a complete word.

wc -l

This counts the number of lines.


If you want to count the number of occurrences per line, you can use the same technique, but process one line at a time:

while read line; do
  echo $line | sed 's/\s\+/\n/g' | grep -w hello  | wc -l
done < test.txt
Share:
13,727
user1304473
Author by

user1304473

Updated on June 04, 2022

Comments

  • user1304473
    user1304473 almost 2 years

    By word, I mean any whitespace-delimited string.

    Suppose the file test.txt has the following words delimited by spaces:

    hello hello hello hell osd
    hello
    hello 
    hello
    hellojames beroo helloooohellool axnber hello
    way
    how 
    

    I want to count the number of times the word hello appears in each line.

    I used the command awk -F "hello" '{print NF-1}' test.txt to show the number of occurrences of the word hello in each line:

    3
    1
    1
    1
    4
    0
    0
    

    So it find a total of 3+1+1+1+4 = 10 occurrences.

    The problem is the on fourth line: hello only occurs 1 time as a separate word; words such as hellojames and helloooohellool should not be counted because hello is not delimited by whitespace.

    So I want it to find 7 occurrences of hello as a separate word.

    Can you help me write a command that returns the correct total of 7 times?