How can I exclude directories matching certain patterns from the output of the Linux 'find' command?
This works for me:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
-not -path '*/deploy/*' -print0 | xargs -0 ls -L1d
Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.
The event not found is because !
is being interpreted as a request for history expansion by bash
. The fix is to use single quotes instead of double quotes.
Pop quiz: What characters are special inside of a single-quoted string in sh
?
Answer: Only '
is special (it ends the string). That's the ultimate safety.
grep
with -Z
(sometimes known as --null
) makes grep
output terminated with a null character instead of newline. What you wanted was -z
(sometimes known as --null-data
) which causes grep
to interpret a null character in its input as end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0
, which adds a null character after each file name instead of a newline.
If you had done it this way:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld
Then the input and output of grep
would have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cpp
and started getting "mysteriously" excluded by your script.
Incidentally, here's a nicer way to generate your testcase file set.
while read -r file ; do
mkdir -p "${file%/*}"
touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA
Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.
phonetagger
Updated on June 17, 2022Comments
-
phonetagger almost 2 years
I want to use regex's with Linux's
find
command to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to anxargs
command to do certain processing on all of the matching files. I can pipe thefind
output through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried usingfind
's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and usingxargs -0
to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimitedfind
through the piped grep filters successfully; grep -Z didn't seem to help in that respect.So I figured I'd just write a better regex for
find
and do away with the intermediarygrep
filters... perhapssed
would be an alternative?In any case, for the following small sampling of directories...
./barney/generated/bam bam.h ./barney/src/bam bam.cpp ./barney/deploy/bam bam.h ./barney/inc/bam bam.h ./fred/generated/dino.h ./fred/src/dino.cpp ./fred/deploy/dino.h ./fred/inc/dino.h
...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.
BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:
mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;
This command finds all of the .h, .c, and .cpp files...
find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"
...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using
ls
as a stand-in for what I actually want to do with the output):$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls ls: ./barney/generated/bam: No such file or directory ls: bam.h: No such file or directory ls: ./barney/src/bam: No such file or directory ls: bam.cpp: No such file or directory ls: ./barney/deploy/bam: No such file or directory ls: bam.h: No such file or directory ls: ./barney/inc/bam: No such file or directory ls: bam.h: No such file or directory ./fred/generated/dino.h ./fred/src/dino.cpp ./fred/deploy/dino.h ./fred/inc/dino.h
So I can enhance that with the -print0 and -0 args to
find
andxargs
:$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls ./barney/generated/bam bam.h ./barney/src/bam bam.cpp ./barney/deploy/bam bam.h ./barney/inc/bam bam.h ./fred/generated/dino.h ./fred/src/dino.cpp ./fred/deploy/dino.h ./fred/inc/dino.h
...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls barney fred
...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for
find
and this is the best I could come up with:find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.
Any ideas how I can make this work? This is the output I want:
$ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls ./barney/src/bam bam.cpp ./barney/inc/bam bam.h ./fred/src/dino.cpp ./fred/inc/dino.h
...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.
Thanks in advance! -Mark