How to count occurrences of a word in all the files of a directory?
Solution 1
grep -roh aaa . | wc -w
Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wc
to count how many words are there.
Solution 2
Another solution based on find
and grep
.
find . -type f -exec grep -o aaa {} \; | wc -l
Should correctly handle filenames with spaces in them.
Solution 3
Use grep
in its simplest way. Try grep --help
for more info.
To get count of a word in a particular file:
grep -c <word> <file_name>
Example:
grep -c 'aaa' abc_report.csv
Output:
445
To get count of a word in the whole directory:
grep -c -R <word>
Example:
grep -c -R 'aaa'
Output:
abc_report.csv:445 lmn_report.csv:129 pqr_report.csv:445 my_folder/xyz_report.csv:408
Solution 4
Let's use AWK!
$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency
This lists the frequency of each word occurring in the provided file. If you want to see the occurrences of your word, you can just do this:
$ cat your_file.txt | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (non-recursively), you can do this:
$ cat * | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:
$ find . -type f | xargs cat | wordfrequency | grep yourword
Source: AWK-ward Ruby
Solution 5
find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l
Ashish Sharma
Updated on July 09, 2022Comments
-
Ashish Sharma almost 2 years
I’m trying to count a particular word occurrence in a whole directory. Is this possible?
Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?
I tried something like:
zegrep "xception" `find . -name '*auth*application*' | wc -l
But it’s not working.
-
SeanDowney over 12 yearsperfect! I was using find based on size, this works perfectly
-
cgledezma almost 11 yearsAlso if you don't want the actual matches, only the count, you can use
grep -rcP '^aaa$' .
That saves you the piping and prevents getting embedded 'aaa' -
Carlos Campderrós almost 11 years@cgledezma good point about
-c
, but it fails if there are two or more occurrences of the searchString in one line. -
cgledezma almost 11 yearsMM... Indeed, I hadn't noticed it only counts the number of lines matched and not the actual number of matches. Still I think it may be useful to place the word boundaries to avoid nested matches. Sorry, I placed them incorrectly on the previous comment:
grep -rohP '\baaa\b . | wc -w
-
Carlos Campderrós almost 11 years@cgledezma sure, word boundaries may be useful in some situations
-
annunarcist over 10 years@Fredrik : this executes perfectly but is there a way to the word count by avoiding multiple counts for that word in the same file? Eg : if word "aaa" appears in "file1.txt" 10 times, but count should increase only by 1 but not 10 & similarly in other files too within a directory.
-
Fredrik Pihl over 10 years@annunarcist -- yes it can be done. Post a new question and I`ll take a look :-)
-
annunarcist over 10 years@Fredrik : posted! Here is the link
-
IanBussieres about 9 yearsOn osx @cgledezma's solution translates to
grep -rohe '\baaa\b . | wc -w
since-P
is not available. -
mrjamesmyers about 6 yearsOne thing to also note is if you search for a pattern that has a space in between multiple words or letters e.g
grep -roh 'global \$' .
orgrep -roh 'one two' .
then when piping to wc -w it will count all of the words. So you may want to only count the number of exact matches not total of all the words in the result. I achieved this by piping into grep again but searching for the first word only e.ggrep -roh 'global \$' . | grep -o 'global' | wc -w
. However may be a more elegant way?