How would you count every occurrence of a term in all files in the current directory?

command-line files directory grep

7,708

Solution 1

Using grep + wc (this will cater for multiple occurences of the term on the same line):

grep -rFo foo | wc -l

-r in grep: searches recursively in the current directory hierarchy;
-F in grep: matches against a fixed string instead of against a pattern;
-o in grep: prints only matches;
-l in wc: prints the count of the lines;

% tree                 
.
├── dir
│   └── file2
└── file1

1 directory, 2 files
% cat file1 
line1 foo foo
line2 foo
line3 foo
% cat dir/file2 
line1 foo foo
line2 foo
line3 foo
% grep -rFo foo | wc -l
8

Solution 2

grep -Rc [term] * will do that. The -R flag means you want to recursively search the current directory and all of its subdirectories. The * is a file selector meaning: all files. The -c flag makes grep output only the number of occurrences. However, if the word occurs multiple times on a single line, it is counted only once.

From man grep:

  -r, --recursive
          Read all files under each directory, recursively, following symbolic links only if they are on the command line.
          This is equivalent to the -d recurse option.

   -R, --dereference-recursive
          Read all files under each directory, recursively.  Follow all symbolic links, unlike -r.

If you have no symbolic links in your directory, there is no difference.

Solution 3

As a variant of @kos's nice answer, if you are interested in itemizing the counts, you can use grep's -c switch to count occurrences:

$ grep -rFoc foo
file1:3
dir/file2:3

Solution 4

In a small python script:

#!/usr/bin/env python3
import os
import sys

s = sys.argv[1]
n = 0
for root, dirs, files in os.walk(os.getcwd()):
    for f in files:
        f = root+"/"+f      
        try:
            n = n + open(f).read().count(s)
        except:
            pass
print(n)

Save it as count_string.py.

Run it from the directory with the command:

  python3 /path/to/count_string.py <term>

Notes

If the term includes spaces, use quotes.
It counts every occurence of the term recursively, also if multiple occurences in one line.

Explanation:

# get the current working directory
currdir = os.getcwd()
# get the term as argument
s = sys.argv[1]
# count occurrences, set start to 0 
n = 0
# use os.walk() to read recursively
for root, dirs, files in os.walk(currdir):
    for f in files:
        # join the path(s) above the file and the file itself
        f = root+"/"+f
        # try to read the file (will fail if the file is unreadable for some reason)
        try:
            # add the number of found occurrences of <term> in the file
            n = n + open(f).read().count(s)
        except:
            pass
print(n)

View more solutions

7,708

TellMeWhy

I Wonder

Updated on September 18, 2022

Comments

TellMeWhy almost 2 years

How would you count every occurrence of a term in all files in the current directory? - and subdirectories(?)

I've read that to do this you would use grep; what is the exact command?

Also, is it possible to the above with some other command?
Wayne_Yux over 8 years

you can add the -c flag to grep. Then grep counts itself and you don't need thewc
TellMeWhy over 8 years

The python guy ;) +1
TellMeWhy over 8 years

btw what's the root and f for?
Jacob Vlijm over 8 years

root is the path to the file including "above" the current directory, f is the file. Alternatively, os.path.join() could be used, but is more verbose.
TellMeWhy over 8 years

And n = n + open(f).read().count(s)?
Edward Torvalds over 8 years

you might wanna put -- before *
Edward Torvalds over 8 years

I think PCREs shouldnt be used since they are experimental
dannysauer over 8 years

The * will only expand to non-dotfiles, so you miss all those. It makes more sense to just use "." since your'e going to process arguments recursively anyway - and that will get dot files. The bigger problem here is that this will could the number of lines, not the number of occurrences of a word. If the term appears multiple times on one line, it will only be counted once by "grep -c"
dannysauer over 8 years

PCREs aren't "experimental", but they're also not always compiled in to grep (which is why I use pcregrep when I need them). In this case, they're unnecessary, though, since the question asks about a "term" which is likely a fixed string, not a pattern of any kind. So, -F would probably be faster.
kos over 8 years

@dannysauer I used PCREs because for some (wrong) reason I thought they were needed to match multiple occurences on the same line, but indeed they're not. I just didn't try using -F instead of -P. Thanks for the great suggestion, updating using -F, which indeed fits better here.
Joe over 8 years

This appears to be the only answer which counts all occurrences of the term as the OP requested. AFAIK, all the solutions using grep will count all the lines on which the term occurs, so a line which includes the term three times will only count as one occurrence.