How would you count every occurrence of a term in all files in the current directory?
Solution 1
Using grep
+ wc
(this will cater for multiple occurences of the term on the same line):
grep -rFo foo | wc -l
-r
ingrep
: searches recursively in the current directory hierarchy;-F
ingrep
: matches against a fixed string instead of against a pattern;-o
ingrep
: prints only matches;-l
inwc
: prints the count of the lines;
% tree
.
├── dir
│ └── file2
└── file1
1 directory, 2 files
% cat file1
line1 foo foo
line2 foo
line3 foo
% cat dir/file2
line1 foo foo
line2 foo
line3 foo
% grep -rFo foo | wc -l
8
Solution 2
grep -Rc [term] *
will do that. The -R
flag means you want to recursively search the current directory and all of its subdirectories. The *
is a file selector meaning: all files. The -c
flag makes grep
output only the number of occurrences. However, if the word occurs multiple times on a single line, it is counted only once.
From man grep
:
-r, --recursive
Read all files under each directory, recursively, following symbolic links only if they are on the command line.
This is equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all symbolic links, unlike -r.
If you have no symbolic links in your directory, there is no difference.
Solution 3
As a variant of @kos's nice answer, if you are interested in itemizing the counts, you can use grep's -c
switch to count occurrences:
$ grep -rFoc foo
file1:3
dir/file2:3
Solution 4
In a small python script:
#!/usr/bin/env python3
import os
import sys
s = sys.argv[1]
n = 0
for root, dirs, files in os.walk(os.getcwd()):
for f in files:
f = root+"/"+f
try:
n = n + open(f).read().count(s)
except:
pass
print(n)
Save it as
count_string.py
.Run it from the directory with the command:
python3 /path/to/count_string.py <term>
Notes
- If the term includes spaces, use quotes.
- It counts every occurence of the term recursively, also if multiple occurences in one line.
Explanation:
# get the current working directory
currdir = os.getcwd()
# get the term as argument
s = sys.argv[1]
# count occurrences, set start to 0
n = 0
# use os.walk() to read recursively
for root, dirs, files in os.walk(currdir):
for f in files:
# join the path(s) above the file and the file itself
f = root+"/"+f
# try to read the file (will fail if the file is unreadable for some reason)
try:
# add the number of found occurrences of <term> in the file
n = n + open(f).read().count(s)
except:
pass
print(n)
Related videos on Youtube
![TellMeWhy](https://i.stack.imgur.com/JfokC.jpg?s=256&g=1)
Comments
-
TellMeWhy almost 2 years
How would you count every occurrence of a term in all files in the current directory? - and subdirectories(?)
I've read that to do this you would use
grep
; what is the exact command?Also, is it possible to the above with some other command?
-
Wayne_Yux over 8 yearsyou can add the
-c
flag togrep
. Then grep counts itself and you don't need thewc
-
TellMeWhy over 8 yearsThe python guy ;) +1
-
TellMeWhy over 8 yearsbtw what's the
root
andf
for? -
Jacob Vlijm over 8 years
root
is the path to the file including "above" the current directory,f
is the file. Alternatively,os.path.join()
could be used, but is more verbose. -
TellMeWhy over 8 yearsAnd
n = n + open(f).read().count(s)
? -
Edward Torvalds over 8 yearsyou might wanna put
--
before*
-
Edward Torvalds over 8 yearsI think
PCREs
shouldnt be used since they are experimental -
dannysauer over 8 yearsThe
*
will only expand to non-dotfiles, so you miss all those. It makes more sense to just use "." since your'e going to process arguments recursively anyway - and that will get dot files. The bigger problem here is that this will could the number of lines, not the number of occurrences of a word. If the term appears multiple times on one line, it will only be counted once by "grep -c" -
dannysauer over 8 yearsPCREs aren't "experimental", but they're also not always compiled in to grep (which is why I use pcregrep when I need them). In this case, they're unnecessary, though, since the question asks about a "term" which is likely a fixed string, not a pattern of any kind. So,
-F
would probably be faster. -
kos over 8 years@dannysauer I used PCREs because for some (wrong) reason I thought they were needed to match multiple occurences on the same line, but indeed they're not. I just didn't try using
-F
instead of-P
. Thanks for the great suggestion, updating using-F
, which indeed fits better here. -
Joe over 8 yearsThis appears to be the only answer which counts all occurrences of the term as the OP requested. AFAIK, all the solutions using grep will count all the lines on which the term occurs, so a line which includes the term three times will only count as one occurrence.