How would you count every occurrence of a term in all files in the current directory?

7,708

Solution 1

Using grep + wc (this will cater for multiple occurences of the term on the same line):

grep -rFo foo | wc -l
  • -r in grep: searches recursively in the current directory hierarchy;
  • -F in grep: matches against a fixed string instead of against a pattern;
  • -o in grep: prints only matches;
  • -l in wc: prints the count of the lines;
% tree                 
.
├── dir
│   └── file2
└── file1

1 directory, 2 files
% cat file1 
line1 foo foo
line2 foo
line3 foo
% cat dir/file2 
line1 foo foo
line2 foo
line3 foo
% grep -rFo foo | wc -l
8

Solution 2

grep -Rc [term] * will do that. The -R flag means you want to recursively search the current directory and all of its subdirectories. The * is a file selector meaning: all files. The -c flag makes grep output only the number of occurrences. However, if the word occurs multiple times on a single line, it is counted only once.

From man grep:

  -r, --recursive
          Read all files under each directory, recursively, following symbolic links only if they are on the command line.
          This is equivalent to the -d recurse option.

   -R, --dereference-recursive
          Read all files under each directory, recursively.  Follow all symbolic links, unlike -r.

If you have no symbolic links in your directory, there is no difference.

Solution 3

As a variant of @kos's nice answer, if you are interested in itemizing the counts, you can use grep's -c switch to count occurrences:

$ grep -rFoc foo
file1:3
dir/file2:3

Solution 4

In a small python script:

#!/usr/bin/env python3
import os
import sys

s = sys.argv[1]
n = 0
for root, dirs, files in os.walk(os.getcwd()):
    for f in files:
        f = root+"/"+f      
        try:
            n = n + open(f).read().count(s)
        except:
            pass
print(n)
  • Save it as count_string.py.

  • Run it from the directory with the command:

      python3 /path/to/count_string.py <term>
    

Notes

  • If the term includes spaces, use quotes.
  • It counts every occurence of the term recursively, also if multiple occurences in one line.

Explanation:

# get the current working directory
currdir = os.getcwd()
# get the term as argument
s = sys.argv[1]
# count occurrences, set start to 0 
n = 0
# use os.walk() to read recursively
for root, dirs, files in os.walk(currdir):
    for f in files:
        # join the path(s) above the file and the file itself
        f = root+"/"+f
        # try to read the file (will fail if the file is unreadable for some reason)
        try:
            # add the number of found occurrences of <term> in the file
            n = n + open(f).read().count(s)
        except:
            pass
print(n)
Share:
7,708

Related videos on Youtube

TellMeWhy
Author by

TellMeWhy

I Wonder

Updated on September 18, 2022

Comments

  • TellMeWhy
    TellMeWhy almost 2 years

    How would you count every occurrence of a term in all files in the current directory? - and subdirectories(?)

    I've read that to do this you would use grep; what is the exact command?

    Also, is it possible to the above with some other command?

  • Wayne_Yux
    Wayne_Yux over 8 years
    you can add the -c flag to grep. Then grep counts itself and you don't need thewc
  • TellMeWhy
    TellMeWhy over 8 years
    The python guy ;) +1
  • TellMeWhy
    TellMeWhy over 8 years
    btw what's the root and f for?
  • Jacob Vlijm
    Jacob Vlijm over 8 years
    root is the path to the file including "above" the current directory, f is the file. Alternatively, os.path.join() could be used, but is more verbose.
  • TellMeWhy
    TellMeWhy over 8 years
    And n = n + open(f).read().count(s)?
  • Edward Torvalds
    Edward Torvalds over 8 years
    you might wanna put -- before *
  • Edward Torvalds
    Edward Torvalds over 8 years
    I think PCREs shouldnt be used since they are experimental
  • dannysauer
    dannysauer over 8 years
    The * will only expand to non-dotfiles, so you miss all those. It makes more sense to just use "." since your'e going to process arguments recursively anyway - and that will get dot files. The bigger problem here is that this will could the number of lines, not the number of occurrences of a word. If the term appears multiple times on one line, it will only be counted once by "grep -c"
  • dannysauer
    dannysauer over 8 years
    PCREs aren't "experimental", but they're also not always compiled in to grep (which is why I use pcregrep when I need them). In this case, they're unnecessary, though, since the question asks about a "term" which is likely a fixed string, not a pattern of any kind. So, -F would probably be faster.
  • kos
    kos over 8 years
    @dannysauer I used PCREs because for some (wrong) reason I thought they were needed to match multiple occurences on the same line, but indeed they're not. I just didn't try using -F instead of -P. Thanks for the great suggestion, updating using -F, which indeed fits better here.
  • Joe
    Joe over 8 years
    This appears to be the only answer which counts all occurrences of the term as the OP requested. AFAIK, all the solutions using grep will count all the lines on which the term occurs, so a line which includes the term three times will only count as one occurrence.