Efficient way to search string within file find and grep

6,092

Solution 1

The fastest I can come up with is to use xargs to share the load:

find . -type f -print0  | xargs -0 grep -Fil "mypattern" 

Running some benchmarks on a directory containing 3631 files:

$ time find . -type f -exec grep -l -i "mystring" {} 2>/dev/null \;

real    0m15.012s
user    0m4.876s
sys     0m1.876s

$ time find . -type f -exec grep -Fli "mystring" {} 2>/dev/null \;

real    0m13.982s
user    0m4.328s
sys     0m1.592s


$ time find . -type f -print0  | xargs -0 grep -Fil "mystring" >/dev/null 

real    0m3.565s
user    0m3.508s
sys     0m0.052s

Your other options would be to streamline either by limiting the file list using find:

   -executable
          Matches files which are executable and  direc‐
          tories  which  are  searchable (in a file name
          resolution sense).  
   -writable
          Matches files which are writable.             

   -mtime n
          File's  data was last modified n*24 hours ago.
          See the comments for -atime to understand  how
          rounding  affects  the  interpretation of file
          modification times.
   -group gname
          File  belongs to group gname (numeric group ID
          allowed).
   -perm /mode
          Any  of  the  permission bits mode are set for
          the file.  Symbolic modes are accepted in this
          form.  You must specify `u', `g' or `o' if you
          use a symbolic mode. 
   -size n[cwbkMG]  <-- you can set a minimum or maximum size
          File uses n units  of  space.  

Or by tweaking grep:

You are already using grep's -l option which cause the file name to be printed and, more importantly, stops at the first match:

   -l, --files-with-matches
       Suppress normal output; instead print the name of each input file  from
       which  output would normally have been printed.  The scanning will stop
       on the first match.  (-l is specified by POSIX.)

The only other thing I can think of to speed things up would be to make sure your pattern is not interpreted as a regex (as suggested by @suspectus) by using the -F option.

Solution 2

Use grep -F, which tells grep to interpret the pattern as a string and not a regular expression (which I assume you do not require). It can be appreciably quicker than grep - depending on the size of files that are being parsed.

On Ubuntu and RHEL Linux it's the -H option will display the file path of a matched file.

find . -type f -exec grep -FHi "mystring" {} +
Share:
6,092
zeropouet
Author by

zeropouet

Updated on September 18, 2022

Comments

  • zeropouet
    zeropouet over 1 year

    I am searching all files containing a specific string on a filer (on an old HP-UX workstation).

    I do not know where the files are located in the file system (there are many directories, with hudge number of scripts, plain-text and binary files).

    I precise that the grep -R option does not exist on this system; so I am using find and grep in order to retrieve which files contains my string:

    find . -type f -exec grep -i "mystring" {} \;
    

    I am not satisfied with this command: it is too slow, and it does not print the name and path of file on which grep matched my string. Moreover if there is an error it will be echoed on my console output.

    So I thought that I could do better:

    find . -type f -exec grep -l -i "mystring" {} 2>/dev/null \;
    

    But it is very slow.

    Do you have a more efficient alternative to this command?

    Thanks you.

    • nik
      nik almost 11 years
      You want the -H option to print the file name along with the match.
    • nik
      nik almost 11 years
      Think of reducing the file-set; work from sub-directories under your ., one at a time; see if you can reduce to specific file extensions or name patterns.
    • terdon
      terdon almost 11 years
      You should be able to make some assumptions about your files. For example, they have a miimum size of 1kb, a maximum of 1GB, they are not owned by root, they are writeable by user X, they have been created at least 3 days ago but no more than 10 years ago, they are not pdfs or .log files. All these can be encoded in a find command using ! and -or etc.
    • terdon
      terdon almost 11 years
      @nik (ignore my previous comment, wrong man page) the -l option should already do what -H does, -l prints the file name and stops at the first match.
    • zeropouet
      zeropouet almost 11 years
      The H option does not exist on my workstation (HP-UX Release 11i). It could be the good option on a linux system.
  • zeropouet
    zeropouet almost 11 years
    Thanks for xargs, I didn't think about it. It's a lot faster. I think that the -exec option is not really fast. I have found another solution to speed my search: I built an index of all files returned by find -type f. Then I used a for loop to search the string with the index built.
  • terdon
    terdon almost 11 years
    @zeropouet the exec option is not slow as such, it is just that xargs will optimize the command and launch many greps in parallel. Have a look at its -P option too. Specifically, try with -P 0.