Grep a directory and return list with line numbers
Solution 1
Many grep
variants implement a recursive option.
E.g., GNU grep
-R, -r, --recursive
Read all files under each directory, recursively; this is equivalent to the -d recurse option.
You can then remove find
:
grep -n -r $pattern $path | awk '{ print $1 }'
but this keeps more than the line number. awk
is printing the first column. This example
src/main/package/A.java:3:import java.util.Map;
src/main/package/A.java:5:import javax.security.auth.Subject;
src/main/package/A.java:6:import javax.security.auth.callback.CallbackHandler;
will be printed as
src/main/package/A.java:3:import
src/main/package/A.java:5:import
src/main/package/A.java:6:import
notice the :import
in each line. You might want to use sed
to filter the output.
Since a :
could be present in the file name you can use the -Z
option of grep to output a nul character (\0) after the file name.
grep -rZn $pattern $path | sed -e "s/[[:cntrl:]]\([0-9][0-9]*\).*/:\1/"
with the same example as before will produce
src/main/package/A.java:3
src/main/package/A.java:5
src/main/package/A.java:6
Solution 2
For the first part, note that xargs
only works if there are no whitespace characters or \'"
in your file names. See How to search for a word in entire content of a directory in linux for an explanation and an alternative.
Also, always put double quotes around variable substitutions: "$path"
. Without the double quotes, the shell expands whitespace and wildcards in the value of $path
, so using it unquoted breaks if you have whitespace or wildcards in that file name. The same goes for $pattern
(just for laughs, try leaving the quotes out and searching for h*
in a directory containing files called hi
and hello
).
If your version of grep
has the -r
option to traverse directories recursively, you don't need find
here. The -r
option is present on Linux, FreeBSD, Mac OS X and Cygwin among others. Otherwise:
find "$path" -type f -exec grep -Hn "$pattern" {} + | awk -F: '{print $1 ":" $2}'
I fixed your awk
call above, as well, so that it prints only the file name and the line numbers. I also pass the -H
option to grep
, to ensure that it always prints the file name, even if there happens to be a single file. This code assumes that your file names don't contain :
or newlines; if they might, things get complicated, and you'd better either rely on GNU grep's -Z
option or process the files individually:
find "$path" -type f -exec sh -c 'for x; do grep -n "$0" <"$x" | awk -v fn="$x" -F: 'print fn ":" $1'; done' "$pattern" {} +
Solution 3
I'd get rid of the grep
and use awk
:
find $path -type f -print0 | xargs -0 awk "/$pattern/{print FILENAME,FNR}"
But using grep
and cut
:
find $path -type f -print0 | xargs -0 grep -nH "$pattern" | cut -d: -f1,2
Include the -type f
clause so you don't get errors trying to search (in either grep or awk) on non-regular file types (symlinks, directories, sockets). If you read from a pipe or a socket when another program is supposed to be, then you might mess up that program.
The find ... -print0 | xargs -0
gets around having whitespace in the filenames. It is not available on every UNIX system, but is on most.
Related videos on Youtube
Zack Hovatter
I'm a super great web developer who likes to work on super great projects with super great people.
Updated on September 18, 2022Comments
-
Zack Hovatter over 1 year
I'm currently trying to learn more about bash scripting and all of that fun stuff, and I pieced together this little command:
find $path | xargs grep -n $pattern | awk '{print $1}'
While this DOES work, I was wondering if I was reinventing the wheel. Is there a better way to search through a directory, grep the files for a pattern, and return a list with line numbers?
-
Zack Hovatter over 12 yearsAwesome - thanks much for that. Especially including the
sed
bit. -
Zack Hovatter over 12 yearsIs there some reason for not using so many pipes or is it for readability? I'm not too familiar with
-exec
, but I'll definitely do some reading then. -
jaypal singh over 12 yearsor pipe it to
awk
like thisgrep -rZn $pattern $path | awk -F: '{print $2,$1}'
and get pretty results! :) -
Gilles 'SO- stop being evil' over 12 yearsProcess susbtitution won't work here, unless you know your file names don't contain whitespace or
\[*?
. See How to search for a word in entire content of a directory in linux -
Arcege over 12 yearsWouldn't the
-Z
and-F:
not match here, @Jaypal ? You would want to replace the \0 character explicitly:grep -rZn "$pattern" "$path" | awk -F: {sub(/\0/,":",$1);print $1}'
-
jaypal singh over 12 yearsAhh you are correct. It should be
grep -n -r $pattern $path | awk -F: '{print $2,$1}'
-
jaypal singh over 12 yearsWhy would it Matteo, I tried this on my computer and got this -
[jaypal:~/Temp] grep -n -r "*p*" ./ | awk -F: '{print $2,$1}' 50 ./backup/GTP_Parser.sh 55 ./backup/GTP_Parser.sh
-
Matteo over 12 years@Jaypal GPT_Parser.sh does not contain a ':' character. awk splits on : how could it recognize if the : is part of the name? Or did I miss something? Try with a file named "test:file:with:.txt"
-
jaypal singh over 12 yearsTrue, I assumed that the file name won't have
:
. Crap! :) -
Matteo over 12 yearsI was just being over-precise: I would assume the same, I almost never used the -Z flag anyway :-)
-
Gilles 'SO- stop being evil' over 12 years@Matteo Yes, or newlines.