How to Combine find and grep for a complex search? ( GNU/linux, find, grep )

32,019

Solution 1

Try

find /srv/www/*/htdocs/system/application/ -name "*.php" -exec grep "debug (" {} \; -print

This should recursively search the folders under application for files with .php extension and pass them to grep.

An optimization on this would be to execute:

find /srv/www/*/htdocs/system/application/ -name "*.php" -print0 | xargs -0 grep -H "debug ("

This uses xargs to pass all the .php files output by find as arguments to a single grep command; e.g., grep "debug (" file1 file2 file3. The -print0 option of find and -0 option of xargs ensure the spaces in file and directory names are correctly handled. The -H option passed to grep ensures that the filename is printed in all situations. (By default, grep prints the filename only when multiple arguments are passed in.)

From man xargs:

-0

      Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally).  Disables the end of file string, which is treated like any other argument.  Useful when input items might contain white space, quote marks, or backslashes.  The GNU find -print0 option produces input suitable for this mode.

Solution 2

find is not even needed for this example, one can use grep directly (at least GNU grep):

grep -RH --include='*.php' "debug (" /srv/www/*/htdocs/system/application/

and we are down to a single process fork.

Options:

  • -R, --dereference-recursive Read all files under each directory, recursively. Follow all symbolic links, unlike -r.
  • -H, --with-filename Print the file name for each match. This is the default when there is more than one file to search.
  • --include=GLOB Search only files whose base name matches GLOB (using wildcard matching as described under --exclude).
  • --exclude=GLOB Skip any command-line file with a name suffix that matches the pattern GLOB, using wildcard matching; a name suffix is either the whole name, or any suffix starting after a / and before a +non-/. When searching recursively, skip any subfile whose base name matches GLOB; the base name is the part after the last /. A pattern can use *, ?, and [...] as wildcards, and \ to quote a wildcard or backslash character literally.
Share:
32,019

Related videos on Youtube

Petruza
Author by

Petruza

General software engineer, golang advocate, also typescript, C, C++, GDScript dev. Interested in emulation, video games, image processing, machine learning, computer vision, natural language processing, web scraping.

Updated on September 17, 2022

Comments

  • Petruza
    Petruza over 1 year

    I'm trying to do a text search in some files that share a similar directory structure, but are not in the same directory tree, in GNU/Linux.

    I have a web server with many sites that share the same tree structure (Code Igniter MVC PHP framework), so I want to search in a specific directory down the tree for each site, example:

    /srv/www/*/htdocs/system/application/

    Where * is the site name. And from those application directories, I want to search all the tree down to its leaves, for an *.php file that has some text pattern inside, let's say "debug(", no regular expression needed.

    I know how to use find and grep but I'm not good at combining them.

    How would I do this?
    Thanks in advance!

  • Jukka Matilainen
    Jukka Matilainen over 14 years
    +1. That will execute grep for each php file, though. If there are lots of files, you can optimize further by find /srv/www/*/htdocs/system/application/ -name "*.php" -print0 | xargs -0 grep "debug ("
  • Randy Orrison
    Randy Orrison over 14 years
    Another small improvement: xargs may just pass one filename to grep, in which case grep won't show the filename if there's a match. You may want to add -H to the grep command to force it to show the filename.
  • David J.
    David J. over 14 years
    @Randy That's a very valid point.
  • Daniel Andersson
    Daniel Andersson about 12 years
    This is true necromancy, but GNU find can take the + operator instead of \; to perform the same sort of single process execution that xargs do. Thus, find /srv/www/*/htdocs/system/application/ -name "*.php" -exec grep -H "debug (" {} + does the same thing as the xargs example in this answer, but with one less process fork (and still 0 risk for file name troubles).
  • Gus
    Gus about 7 years
    Just for curiosity, what do the -RH options mean?
  • Daniel Andersson
    Daniel Andersson about 7 years
    @Gus: Added man grep excerpt of option descriptions to the post.