How can I exclude directories matching certain patterns from the output of the Linux 'find' command?

11,072

This works for me:

find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
       -not -path '*/deploy/*' -print0 | xargs -0 ls -L1d

Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.

The event not found is because ! is being interpreted as a request for history expansion by bash. The fix is to use single quotes instead of double quotes.

Pop quiz: What characters are special inside of a single-quoted string in sh?

Answer: Only ' is special (it ends the string). That's the ultimate safety.

grep with -Z (sometimes known as --null) makes grep output terminated with a null character instead of newline. What you wanted was -z (sometimes known as --null-data) which causes grep to interpret a null character in its input as end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0, which adds a null character after each file name instead of a newline.

If you had done it this way:

find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
    grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld

Then the input and output of grep would have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cpp and started getting "mysteriously" excluded by your script.

Incidentally, here's a nicer way to generate your testcase file set.

while read -r file ; do
    mkdir -p "${file%/*}"
    touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA

Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.

Share:
11,072
phonetagger
Author by

phonetagger

Updated on June 17, 2022

Comments

  • phonetagger
    phonetagger almost 2 years

    I want to use regex's with Linux's find command to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to an xargs command to do certain processing on all of the matching files. I can pipe the find output through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried using find's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and using xargs -0 to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimited find through the piped grep filters successfully; grep -Z didn't seem to help in that respect.

    So I figured I'd just write a better regex for find and do away with the intermediary grep filters... perhaps sed would be an alternative?

    In any case, for the following small sampling of directories...

    ./barney/generated/bam bam.h
    ./barney/src/bam bam.cpp
    ./barney/deploy/bam bam.h
    ./barney/inc/bam bam.h
    ./fred/generated/dino.h
    ./fred/src/dino.cpp
    ./fred/deploy/dino.h
    ./fred/inc/dino.h
    

    ...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.

    BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:

    mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;
    

    This command finds all of the .h, .c, and .cpp files...

    find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"
    

    ...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using ls as a stand-in for what I actually want to do with the output):

    $ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls
    ls: ./barney/generated/bam: No such file or directory
    ls: bam.h: No such file or directory
    ls: ./barney/src/bam: No such file or directory
    ls: bam.cpp: No such file or directory
    ls: ./barney/deploy/bam: No such file or directory
    ls: bam.h: No such file or directory
    ls: ./barney/inc/bam: No such file or directory
    ls: bam.h: No such file or directory
    ./fred/generated/dino.h
    ./fred/src/dino.cpp
    ./fred/deploy/dino.h
    ./fred/inc/dino.h
    

    So I can enhance that with the -print0 and -0 args to find and xargs:

    $ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls
    ./barney/generated/bam bam.h
    ./barney/src/bam bam.cpp
    ./barney/deploy/bam bam.h
    ./barney/inc/bam bam.h
    ./fred/generated/dino.h
    ./fred/src/dino.cpp
    ./fred/deploy/dino.h
    ./fred/inc/dino.h
    

    ...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:

    $ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls
    barney  fred
    

    ...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for find and this is the best I could come up with:

    find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
    

    ...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.

    Any ideas how I can make this work? This is the output I want:

    $ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls
    ./barney/src/bam bam.cpp
    ./barney/inc/bam bam.h
    ./fred/src/dino.cpp
    ./fred/inc/dino.h
    

    ...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.

    Thanks in advance! -Mark