Recursive search for a pattern, then for each match print out the specific SEQUENCE: line number, file name, and no file contents

12,427

Solution 1

Using grep

Why can't you just use the -r switch to grep to recurse the filesystem instead of making use of find? There are 2 additional switches I'd use too, instead of the -n switch.

$ grep -rHn PATTERN <DIR> | cut -d":" -f1-2

Example #1

$ grep -rHn PATH ~/.bashrc | cut -d":" -f1-2
/home/saml/.bashrc:25

Details

  • -r - recursively search through files + directories
  • -H - prints the name of the file if it matches (less restrictive than -l) i.e. it works with grep's other switches
  • -n - display the line number of the match

Example #2

$ grep -rHn PATH ~/.bash* | cut -d":" -f1-2
/home/saml/.bash_profile:10
/home/saml/.bash_profile:12
/home/saml/.bash_profile_askapache:99
/home/saml/.bash_profile_askapache:101
/home/saml/.bash_profile_askapache:118
/home/saml/.bash_profile_askapache:166
/home/saml/.bash_profile_askapache:218
/home/saml/.bash_profile_askapache:250
/home/saml/.bash_profile_askapache:314
/home/saml/.bash_profile_askapache:2317
/home/saml/.bash_profile_askapache:2323
/home/saml/.bashrc:25

Using find

$ find . -exec sh -c 'grep -Hn PATTERN "$@" | cut -d":" -f1-2' {}  +

Example

$ find ~/.bash* -exec sh -c 'grep -Hn PATH "$@" | cut -d":" -f1-2' {}  +
/home/saml/.bash_profile:10
/home/saml/.bash_profile:12
/home/saml/.bash_profile_askapache:99
/home/saml/.bash_profile_askapache:101
/home/saml/.bash_profile_askapache:118
/home/saml/.bash_profile_askapache:166
/home/saml/.bash_profile_askapache:218
/home/saml/.bash_profile_askapache:250
/home/saml/.bash_profile_askapache:314
/home/saml/.bash_profile_askapache:2317
/home/saml/.bash_profile_askapache:2323
/home/saml/.bashrc:25

If you truly want to use find you can do something like this to exec grep upon finding the files using find.

Solution 2

grep -n PATTERN `find . -type f`

This is bad because the output of a command substitution is interpreted as a whitespace-separated list of file name wildcard patterns. If any of the file names contains whitespace or one of \[*?, this snippet doesn't work. Also, if there are many matching files, this will eventually result in a command line that is too long.

find . -exec grep -n PATTERN  '{}' \;

This is fine and reliable, but grep is invoked once per file. This is why it's so slow.

Use -exec … {} + to execute the command in batches of as many files as possible. Note that it could happen that the last batch (or in theory others) consists of a single file, so grep won't print the file name; pass the -H option to always print the file name, or add the argument /dev/null (which never contains any matches, but ensures that grep sees at least two file names).

find . -type f -exec grep -Hn PATTERN {} +

GNU grep doesn't have an option to print matching line numbers but not the matching line text. You can strip the matching text, and swap the line numbers with the file name, with sed.

find . -type f -exec grep -Hn PATTERN {} + | sed 's/^\([^:]*\):\([^:]*\):.*/\2:\1/'

If you want to right-align the line numbers, awk is a lot simpler than any alternative I can think of.

find . -type f -exec grep -Hn PATTERN {} + | awk -F : '{printf "%8d:%s", $2, $1}'

You can gain more control by doing the matching in awk instead of grep. Awk tends to be a bit slower because it's a more general-purpose tool with an interpreted language. One benefit is that you can choose what to do with file names containing a colon or newline, which lead to ambiguous output from grep. The following snippet uses awk to do the searching and copes with file names containing : (and even newlines, but for these it produces ambiguous output). Note that awk uses extended regular expressions, like grep -E (with minor variations, but not really more than you get between implementations of grep or of awk).

find . -type f -exec awk '/PATTERN/ {printf "%d:", FNR; print FILENAME}' {} +
Share:
12,427

Related videos on Youtube

John Sonderson
Author by

John Sonderson

Updated on September 18, 2022

Comments

  • John Sonderson
    John Sonderson over 1 year

    What I am after is almost exactly the same as can be found here, but I want the format "line number, separator, filename, newline" in the results, thus displaying the line number at the beginning of the line, not after the filename, and without displaying the line containing the match.

    The reason why this format is preferable is that

    • (a) the filename might be long and cryptic and contain the separator which the tool uses to separate the filename from the line number, making it incredibly difficult to use awk to achieve this, since the pattern inside the file might also contain the same separator. Also, line numbers at the beginning of the line will be aligned better than if they appear after the filename. And the other reason for this desired format is that
    • (b) the lines matching the pattern may be too long and mess up the one line per row property on the output displayed on standard out (and viewing the output on standard out is better than having to save to a file and use a tool like vi to view one line per row in the output file).

      How can I recursively search directories for a pattern and just print out file names and line numbers

    Now that I've set out the requirement, consider this:

    1. Ack is not installed on the Linux host I'm using, so I cannot use it.

    2. If I do the following, the shell executes find . and substitutes 'find .` with a list of absolute paths starting at the current working directory and proceeding downwards recursively:

      grep -n PATTERN $(find .)
      

      then the -n prints the line number, but not where I want it. Also, for some reason I do not understand, if a directory name includes the PATTERN, then grep matches it in addition to the regular files that contain the pattern. This is not what I want, so I use:

      grep -n PATTERN $(find . -type f)
      

      I also wanted to change this command so that the output of find is passed on to grep dynamically. Rather than having to build the entire list of absolute paths first and then pass the bulk of them to grep, have find pass each line to grep as it builds the list, so I tried:

      find . -exec grep -n PATTERN  '{}' \;
      

      which seems like the right syntax according to the man page but when I issue this command the Bash shell executes about 100 times slower, so this is not the way to go.

    In view of what I described, how can I execute something similar to this command and obtain the desired format. I have already listed the problems associated with the related post.

  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 10 years
    You get less control with grep, in particular it goes through symbolic links (try grep -r something /etc on Debian/Ubuntu and weep).
  • slm
    slm over 10 years
    @Gilles - thanks, I'll add a find example too.
  • ChuckCottrill
    ChuckCottrill over 10 years
    OP was also concerned about delimiter being nested in filenames - so cut might not work. Otherwise, good.
  • John Sonderson
    John Sonderson over 10 years
    Your explanation concerning why the second option is so slow is very clear. Thanks. As to the -H flag, my man page says "Print the file name for each match. This is the default when there is more than one file to search." So it seems like it might not be needed. I don't know why, but the man page doesn't specify what happens in the case where there is only one file to match, perhaps someone can comment.
  • John Sonderson
    John Sonderson over 10 years
    As to invoking awk in this way, perhaps I have not been clear, but if the field separator (specified after -F) which is the colon (:), appears in the filename, which is printed before the line number, then your command using awk will not work.
  • John Sonderson
    John Sonderson over 10 years
    Hello. The -H flag is applied by default, so is not needed (at least not on my Linux system, according to the man page).
  • John Sonderson
    John Sonderson over 10 years
    The reason I don't like to use grep -r, is that, as I explained in the post, if there is a directory whose name contains the pattern, than grep will match that directory for some reason I don't understand (as though the directory were being treated as a regular file and the directory name appeared within such file). This means that if a directory contains 1000 leaf nodes which do not match the pattern, but the directory name matches the pattern, I will get 1000 extra garbage lines in the output.
  • John Sonderson
    John Sonderson over 10 years
    As I explained in an earlier comment concerning awk, using cut in this manner is not correct because if the filename contains the separator (the colon), then the output of cut will not be what we want.
  • slm
    slm over 10 years
    @JohnSonderson - as to grep it knows nothing of files or directories, it's just a searching tool, so if the pattern matches part of the dir. or a file it will return it as a match. You'll have to use a regular expression within grep to make it be more specific. Something like grep -HnE ".*/(.*)" ...
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 10 years
    @JohnSonderson ”-h, --no-filename Suppress the prefixing of file names on output. This is the default when there is only one file (or only standard input) to search.“ If you want correct output for file names containing colons, you can't use grep, not without a complex shell wrapper. You can use awk, see my edit.
  • slm
    slm over 10 years
    @JohnSonderson - every test I can conceive of I get output like this from grep: "$ grep -rHn blah 1 1/1/1/1.txt:1:blah"
  • slm
    slm over 10 years
    @JohnSonderson - as to the use of -H it forces grep to always print the filename. When giving grep a single filename it doesn't print the name of the file in the output.