Efficient way to search string within file find and grep
Solution 1
The fastest I can come up with is to use xargs
to share the load:
find . -type f -print0 | xargs -0 grep -Fil "mypattern"
Running some benchmarks on a directory containing 3631 files:
$ time find . -type f -exec grep -l -i "mystring" {} 2>/dev/null \;
real 0m15.012s
user 0m4.876s
sys 0m1.876s
$ time find . -type f -exec grep -Fli "mystring" {} 2>/dev/null \;
real 0m13.982s
user 0m4.328s
sys 0m1.592s
$ time find . -type f -print0 | xargs -0 grep -Fil "mystring" >/dev/null
real 0m3.565s
user 0m3.508s
sys 0m0.052s
Your other options would be to streamline either by limiting the file list using find
:
-executable
Matches files which are executable and direc‐
tories which are searchable (in a file name
resolution sense).
-writable
Matches files which are writable.
-mtime n
File's data was last modified n*24 hours ago.
See the comments for -atime to understand how
rounding affects the interpretation of file
modification times.
-group gname
File belongs to group gname (numeric group ID
allowed).
-perm /mode
Any of the permission bits mode are set for
the file. Symbolic modes are accepted in this
form. You must specify `u', `g' or `o' if you
use a symbolic mode.
-size n[cwbkMG] <-- you can set a minimum or maximum size
File uses n units of space.
Or by tweaking grep
:
You are already using grep
's -l
option which cause the file name to be printed and, more importantly, stops at the first match:
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from
which output would normally have been printed. The scanning will stop
on the first match. (-l is specified by POSIX.)
The only other thing I can think of to speed things up would be to make sure your pattern is not interpreted as a regex (as suggested by @suspectus) by using the -F
option.
Solution 2
Use grep -F
, which tells grep
to interpret the pattern as a string and not a regular expression (which I assume you do not require). It can be appreciably quicker than grep
- depending on the size of files that are being parsed.
On Ubuntu and RHEL Linux it's the -H option will display the file path of a matched file.
find . -type f -exec grep -FHi "mystring" {} +
zeropouet
Updated on September 18, 2022Comments
-
zeropouet over 1 year
I am searching all files containing a specific string on a filer (on an old HP-UX workstation).
I do not know where the files are located in the file system (there are many directories, with hudge number of scripts, plain-text and binary files).
I precise that the grep -R option does not exist on this system; so I am using find and grep in order to retrieve which files contains my string:
find . -type f -exec grep -i "mystring" {} \;
I am not satisfied with this command: it is too slow, and it does not print the name and path of file on which grep matched my string. Moreover if there is an error it will be echoed on my console output.
So I thought that I could do better:
find . -type f -exec grep -l -i "mystring" {} 2>/dev/null \;
But it is very slow.
Do you have a more efficient alternative to this command?
Thanks you.
-
nik almost 11 yearsYou want the
-H
option to print the file name along with the match. -
nik almost 11 yearsThink of reducing the file-set; work from sub-directories under your
.
, one at a time; see if you can reduce to specific file extensions or name patterns. -
terdon almost 11 yearsYou should be able to make some assumptions about your files. For example, they have a miimum size of 1kb, a maximum of 1GB, they are not owned by root, they are writeable by user X, they have been created at least 3 days ago but no more than 10 years ago, they are not pdfs or .log files. All these can be encoded in a find command using
!
and-or
etc. -
terdon almost 11 years@nik (ignore my previous comment, wrong man page) the
-l
option should already do what-H
does,-l
prints the file name and stops at the first match. -
zeropouet almost 11 yearsThe H option does not exist on my workstation (HP-UX Release 11i). It could be the good option on a linux system.
-
-
zeropouet almost 11 yearsThanks for xargs, I didn't think about it. It's a lot faster. I think that the -exec option is not really fast. I have found another solution to speed my search: I built an index of all files returned by find -type f. Then I used a for loop to search the string with the index built.
-
terdon almost 11 years@zeropouet the
exec
option is not slow as such, it is just thatxargs
will optimize the command and launch manygrep
s in parallel. Have a look at its-P
option too. Specifically, try with-P 0
.