Compacting `find` name patterns

find efficiency regular-expression

5,693

Solution 1

As you (incorrectly – what you used is a shell pattern) mentioned it in the subject, you should use regular expressions:

find . -iregex '.*\.[ch]+'

The above is lazy approach, which will also find .ch, .hh and alike, if there exists. For exact matches you still have to enumerate what you want, but that is still easier with regular expressions:

find . -regex '.*\.\(c\|C\|cc\|CC\|h\|H\)'

Solution 2

Portably/standardly (POSIX, Unix (SUS) and Linux (LSB) standards) and efficiently, you'd write it:

find . \( -name '*.cc' -o -name '*.CC' -o -name '*.[cChH]' \) \
  -type f -exec grep -n -- "$1" /dev/null {} +

The most important point here is to use + instead of ;. Otherwise, you'll run one grep command per file.

The -H option is GNU specific, but adding /dev/null (which makes sure grep gets at least two files to look in) guarantees that grep displays the file name.

You'll need "--" unless you can make sure that $1 will never start with -.

Adding -type f here, to avoid looking into non-regular files (like directories), but as that means it also excludes symlinks, you may wish to leave it out.

Solution 3

Can be shortened to this single line:

find -type f -regextype posix-egrep -iregex '.*\.(cc|h|c)$' -exec grep -nHr "$1" {} \;

5,693

Arpith

Senior Software Engineer with MobileIron, Inc. Previously, Software Developer with Cisco Systems, Inc. and Frog Design, Inc.

Updated on September 18, 2022

Comments

Arpith almost 2 years
I am using
```
find . -name '*.[cCHh][cC]' -exec grep -nHr "$1" {} ';'
find . -name '*.[cCHh]' -exec grep -nHr "$1" {} ';'
```
to search for a string in all files ending with .c, .C, .h, .H, .cc and .CC listed in all subdirectories. But since this includes two commands this feels inefficient.

How do I write a regex to include .c,.C,.h,.H,.cc and .CC files using one single regex?

EDIT: I am running this on bash on a Linux machine.
- rush over 11 years
  
  By the way, you can use '+' at the end of find instead of ';'. It will accelerate the command due to shell will execute one grep per many files, not one grep per file as with ';'.
manatwork over 11 years

Your regular expression is wrong. It says “any character 0 or more times, followed by one of the enumerated strings”. On my machine that finds a lot of .sh script files…
daisy over 11 years

@manatwork right, updated the answer
Stéphane Chazelas over 11 years

knitpicking here, but the above would match .cC or .Cc files which were not requested. Also note that the $ is not needed as GNU find's regexps are implicitely anchored.
Richard Fortune over 11 years

-H versus /dev/null: very nice! FWIW, the -H option does seem to be widely available (I see it on FreeBSD 9, BusyBox grep, and Mac OS X grep).
Stéphane Chazelas over 11 years

@dubiousjim AFAIK FreeBSD and MacOS/X greps are the GNU grep.
Richard Fortune over 11 years

Oh yeah, you're right! That surprises me about FreeBSD. Most of their tools aren't Gnu. For instance, their sed and awk aren't.
Arpith over 11 years

How is this different from using find . -name '.*\.$c\|C\|cc\|CC\|h\|H$' ?
manatwork over 11 years

@Arpith, with -name you specify a shell pattern, with -regex you specify a regular expression. That '.*\.(c\|C\|cc\|CC\|h\|H)' string interpreted as shell pattern will rarely match anything, but certainly not what you intended in your question: pastebin.com/yhddCnbv