Grep a directory and return list with line numbers

5,243

Solution 1

Many grep variants implement a recursive option. E.g., GNU grep

-R, -r, --recursive
          Read all files under each directory, recursively; this is equivalent to the -d recurse option.

You can then remove find:

grep -n -r $pattern $path | awk '{ print $1 }'

but this keeps more than the line number. awk is printing the first column. This example

src/main/package/A.java:3:import java.util.Map;
src/main/package/A.java:5:import javax.security.auth.Subject;
src/main/package/A.java:6:import javax.security.auth.callback.CallbackHandler;

will be printed as

src/main/package/A.java:3:import
src/main/package/A.java:5:import
src/main/package/A.java:6:import

notice the :import in each line. You might want to use sed to filter the output.

Since a : could be present in the file name you can use the -Z option of grep to output a nul character (\0) after the file name.

grep -rZn $pattern $path | sed -e "s/[[:cntrl:]]\([0-9][0-9]*\).*/:\1/" 

with the same example as before will produce

src/main/package/A.java:3
src/main/package/A.java:5
src/main/package/A.java:6

Solution 2

For the first part, note that xargs only works if there are no whitespace characters or \'" in your file names. See How to search for a word in entire content of a directory in linux for an explanation and an alternative.

Also, always put double quotes around variable substitutions: "$path". Without the double quotes, the shell expands whitespace and wildcards in the value of $path, so using it unquoted breaks if you have whitespace or wildcards in that file name. The same goes for $pattern (just for laughs, try leaving the quotes out and searching for h* in a directory containing files called hi and hello).

If your version of grep has the -r option to traverse directories recursively, you don't need find here. The -r option is present on Linux, FreeBSD, Mac OS X and Cygwin among others. Otherwise:

find "$path" -type f -exec grep -Hn "$pattern" {} + | awk -F: '{print $1 ":" $2}'

I fixed your awk call above, as well, so that it prints only the file name and the line numbers. I also pass the -H option to grep, to ensure that it always prints the file name, even if there happens to be a single file. This code assumes that your file names don't contain : or newlines; if they might, things get complicated, and you'd better either rely on GNU grep's -Z option or process the files individually:

find "$path" -type f -exec sh -c 'for x; do grep -n "$0" <"$x" | awk -v fn="$x" -F: 'print fn ":" $1'; done' "$pattern" {} +

Solution 3

I'd get rid of the grep and use awk:

find $path -type f -print0 | xargs -0 awk "/$pattern/{print FILENAME,FNR}"

But using grep and cut:

find $path -type f -print0 | xargs -0 grep -nH "$pattern" | cut -d: -f1,2

Include the -type f clause so you don't get errors trying to search (in either grep or awk) on non-regular file types (symlinks, directories, sockets). If you read from a pipe or a socket when another program is supposed to be, then you might mess up that program.

The find ... -print0 | xargs -0 gets around having whitespace in the filenames. It is not available on every UNIX system, but is on most.

Share:
5,243

Related videos on Youtube

Zack Hovatter
Author by

Zack Hovatter

I'm a super great web developer who likes to work on super great projects with super great people.

Updated on September 18, 2022

Comments

  • Zack Hovatter
    Zack Hovatter over 1 year

    I'm currently trying to learn more about bash scripting and all of that fun stuff, and I pieced together this little command:

    find $path | xargs grep -n $pattern | awk '{print $1}'
    

    While this DOES work, I was wondering if I was reinventing the wheel. Is there a better way to search through a directory, grep the files for a pattern, and return a list with line numbers?

  • Zack Hovatter
    Zack Hovatter over 12 years
    Awesome - thanks much for that. Especially including the sed bit.
  • Zack Hovatter
    Zack Hovatter over 12 years
    Is there some reason for not using so many pipes or is it for readability? I'm not too familiar with -exec, but I'll definitely do some reading then.
  • jaypal singh
    jaypal singh over 12 years
    or pipe it to awk like this grep -rZn $pattern $path | awk -F: '{print $2,$1}' and get pretty results! :)
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 12 years
    Process susbtitution won't work here, unless you know your file names don't contain whitespace or \[*?. See How to search for a word in entire content of a directory in linux
  • Arcege
    Arcege over 12 years
    Wouldn't the -Z and -F: not match here, @Jaypal ? You would want to replace the \0 character explicitly: grep -rZn "$pattern" "$path" | awk -F: {sub(/\0/,":",$1);print $1}'
  • jaypal singh
    jaypal singh over 12 years
    Ahh you are correct. It should be grep -n -r $pattern $path | awk -F: '{print $2,$1}'
  • jaypal singh
    jaypal singh over 12 years
    Why would it Matteo, I tried this on my computer and got this - [jaypal:~/Temp] grep -n -r "*p*" ./ | awk -F: '{print $2,$1}' 50 ./backup/GTP_Parser.sh 55 ./backup/GTP_Parser.sh
  • Matteo
    Matteo over 12 years
    @Jaypal GPT_Parser.sh does not contain a ':' character. awk splits on : how could it recognize if the : is part of the name? Or did I miss something? Try with a file named "test:file:with:.txt"
  • jaypal singh
    jaypal singh over 12 years
    True, I assumed that the file name won't have :. Crap! :)
  • Matteo
    Matteo over 12 years
    I was just being over-precise: I would assume the same, I almost never used the -Z flag anyway :-)
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 12 years
    @Matteo Yes, or newlines.