Compacting `find` name patterns

5,693

Solution 1

As you (incorrectly – what you used is a shell pattern) mentioned it in the subject, you should use regular expressions:

find . -iregex '.*\.[ch]+'

The above is lazy approach, which will also find .ch, .hh and alike, if there exists. For exact matches you still have to enumerate what you want, but that is still easier with regular expressions:

find . -regex '.*\.\(c\|C\|cc\|CC\|h\|H\)'

Solution 2

Portably/standardly (POSIX, Unix (SUS) and Linux (LSB) standards) and efficiently, you'd write it:

find . \( -name '*.cc' -o -name '*.CC' -o -name '*.[cChH]' \) \
  -type f -exec grep -n -- "$1" /dev/null {} +

The most important point here is to use + instead of ;. Otherwise, you'll run one grep command per file.

The -H option is GNU specific, but adding /dev/null (which makes sure grep gets at least two files to look in) guarantees that grep displays the file name.

You'll need "--" unless you can make sure that $1 will never start with -.

Adding -type f here, to avoid looking into non-regular files (like directories), but as that means it also excludes symlinks, you may wish to leave it out.

Solution 3

Can be shortened to this single line:

find -type f -regextype posix-egrep -iregex '.*\.(cc|h|c)$' -exec grep -nHr "$1" {} \;

Share:
5,693

Related videos on Youtube

Arpith
Author by

Arpith

Senior Software Engineer with MobileIron, Inc. Previously, Software Developer with Cisco Systems, Inc. and Frog Design, Inc.

Updated on September 18, 2022

Comments

  • Arpith
    Arpith almost 2 years

    I am using

    find . -name '*.[cCHh][cC]' -exec grep -nHr "$1" {} ';'
    find . -name '*.[cCHh]' -exec grep -nHr "$1" {} ';'
    

    to search for a string in all files ending with .c, .C, .h, .H, .cc and .CC listed in all subdirectories. But since this includes two commands this feels inefficient.

    How do I write a regex to include .c,.C,.h,.H,.cc and .CC files using one single regex?

    EDIT: I am running this on bash on a Linux machine.

    • rush
      rush over 11 years
      By the way, you can use '+' at the end of find instead of ';'. It will accelerate the command due to shell will execute one grep per many files, not one grep per file as with ';'.
  • manatwork
    manatwork over 11 years
    Your regular expression is wrong. It says “any character 0 or more times, followed by one of the enumerated strings”. On my machine that finds a lot of .sh script files…
  • daisy
    daisy over 11 years
    @manatwork right, updated the answer
  • Stéphane Chazelas
    Stéphane Chazelas over 11 years
    knitpicking here, but the above would match .cC or .Cc files which were not requested. Also note that the $ is not needed as GNU find's regexps are implicitely anchored.
  • Richard Fortune
    Richard Fortune over 11 years
    -H versus /dev/null: very nice! FWIW, the -H option does seem to be widely available (I see it on FreeBSD 9, BusyBox grep, and Mac OS X grep).
  • Stéphane Chazelas
    Stéphane Chazelas over 11 years
    @dubiousjim AFAIK FreeBSD and MacOS/X greps are the GNU grep.
  • Richard Fortune
    Richard Fortune over 11 years
    Oh yeah, you're right! That surprises me about FreeBSD. Most of their tools aren't Gnu. For instance, their sed and awk aren't.
  • Arpith
    Arpith over 11 years
    How is this different from using find . -name '.*\.\(c\|C\|cc\|CC\|h\|H\)' ?
  • manatwork
    manatwork over 11 years
    @Arpith, with -name you specify a shell pattern, with -regex you specify a regular expression. That '.*\.(c\|C\|cc\|CC\|h\|H)' string interpreted as shell pattern will rarely match anything, but certainly not what you intended in your question: pastebin.com/yhddCnbv