How to match once per file in grep?

49,719

Solution 1

I think you can just do something like

grep -ri -m1 --include '*.coffee' 're' . | head -n 2

to e.g. pick the first match from each file, and pick at most two matches total.

Note that this requires your grep to treat -m as a per-file match limit; GNU grep does do this, but BSD grep apparently treats it as a global match limit.

Solution 2

So, using grep, you just need the option -l, --files-with-matches.

All those answers about find, awk or shell scripts are away from the question.

Solution 3

I would do this in awk instead.

find . -name \*.coffee -exec awk '/re/ {print FILENAME ":" $0;exit}' {} \;

If you didn't need to recurse, you could just do it with awk:

awk '/re/ {print FILENAME ":" $0;nextfile}' *.coffee

Or, if you're using a current enough bash, you can use globstar:

shopt -s globstar
awk '/re/ {print FILENAME ":" $0;nextfile}' **/*.coffee

Solution 4

using find and xargs. find every .coffee files and excute -m1 grep to each of them

find . -print0 -name '*.coffee'|xargs -0 grep -m1 -ri 're'

test without -m1

linux# find . -name '*.txt'|xargs grep -ri 'oyss'
./test1.txt:oyss
./test1.txt:oyss1
./test1.txt:oyss2
./test2.txt:oyss1
./test2.txt:oyss2
./test2.txt:oyss3

add -m1

linux# find . -name '*.txt'|xargs grep -m1 -ri 'oyss'
./test1.txt:oyss
./test2.txt:oyss1

Solution 5

find . -name \*.coffee -exec grep -m1 -i 're' {} \;

find's -exec option runs the command once for each matched file (unless you use + instead of \;, which makes it act like xargs).

Share:
49,719
pathikrit
Author by

pathikrit

Experienced in developing scalable solutions for complex problems. I enjoy working full-stack - from architecting schema and data-flows, implementing algorithms, designing APIs to crafting innovative UIs. My professional interests include algorithms, functional programming, finance, data analytics and visualization.

Updated on July 05, 2022

Comments

  • pathikrit
    pathikrit almost 2 years

    Is there any grep option that let's me control total number of matches but stops at first match on each file?

    Example:

    If I do this grep -ri --include '*.coffee' 're' . I get this:

    ./app.coffee:express = require 'express'
    ./app.coffee:passport = require 'passport'
    ./app.coffee:BrowserIDStrategy = require('passport-browserid').Strategy
    ./app.coffee:app = express()
    ./config.coffee:    session_secret: 'nyan cat'
    

    And if I do grep -ri -m2 --include '*.coffee' 're' ., I get this:

    ./app.coffee:config = require './config'
    ./app.coffee:passport = require 'passport'
    

    But, what I really want is this output:

    ./app.coffee:express = require 'express'
    ./config.coffee:    session_secret: 'nyan cat'
    

    Doing -m1 does not work as I get this for grep -ri -m1 --include '*.coffee' 're' .

    ./app.coffee:express = require 'express'
    

    Tried not using grep e.g. this find . -name '*.coffee' -exec awk '/re/ {print;exit}' {} \; produced:

    config = require './config'
        session_secret: 'nyan cat'
    

    UPDATE: As noted below the GNU grep -m option treats counts per file whereas -m for BSD grep treats it as global match count

  • pathikrit
    pathikrit over 11 years
    -m1 stops at first match globally for me. In any case, if there are millions of matches and I only want 100 of them then this is inefficient as the grep would still go for the first million matches before piping result into head
  • nneonneo
    nneonneo over 11 years
    head stops reading input after the first hundred lines, and grep streams them match-by-match. After head stops reading input, grep will stop finding matches.
  • pathikrit
    pathikrit over 11 years
    I just tried and it just prints the first result for me with or without the | head -n 2 part. If I change option to -m2 I see 2 results.
  • nneonneo
    nneonneo over 11 years
    -m is clearly documented in my man grep as a per-file option...what grep are you using?
  • pathikrit
    pathikrit over 11 years
    grep --v says grep (BSD grep) 2.5.1-FreeBSD on my Mountain Lion
  • nneonneo
    nneonneo over 11 years
    Funny, I'm on Lion and my /usr/bin/grep is grep (GNU grep) 2.5.1 (and it does per-file -m).
  • pathikrit
    pathikrit over 11 years
    Did not print as expected. Here's what I got: bash-3.2$ find . -name \*.coffee -exec awk '/re/ {print;exit}' {} \; config = require './config' session_secret: 'nyan cat'
  • ruakh
    ruakh over 11 years
    One problem with that is, at least on my system, you can't really pipe the output of find -exec to head, because the SIGPIPE goes to the process that find launches, rather than to find itself, so it just keeps re-launching the program long after it's found two matches.
  • ghoti
    ghoti over 11 years
    Updated the answer to include filenames, as well as globstar as an alternate way to recurse. As for piping to head, why would you need to do that here? I don't see a requirement for that in the question. The awk script takes care of stopping after the first match in each file.
  • ghoti
    ghoti over 11 years
    @wrick - just a note about globstar; I gather you're using an older bash, since your prompt is base-3.2$. Globstar was added to bash in version 4.0. You can either skip globstar, or install a more recent bash using MacPorts. Also, I don't see the problem with your output. While comments suck for code/output formatting, it appears you're seeing lines with re in them. If you like, you can edit your question to include a better formatted result for this attempt.
  • Schwern
    Schwern over 11 years
    I can confirm, /usr/bin/grep on OS X 10.8.2 is (BSD grep) 2.5.1-FreeBSD and its -m is global, not per file. GNU grep is per file. nneonneo, you must have overwritten /usr/bin with GNU tools. @wrick I'd suggest getting GNU tools, the BSD ones that OS X comes with are kinda janky. It will make your life much easier in the long run. Use MacPorts or homebrew.
  • nneonneo
    nneonneo over 11 years
    @Schwern: I guess I must've, but I don't recall ever doing it :-\
  • pathikrit
    pathikrit over 11 years
    Done - still did not work - did not print results from other file
  • Graham
    Graham over 11 years
    This won't work if there are special characters in filenames. See the parsing ls problem.
  • Graham
    Graham over 11 years
    @Schwern - I would NEVER recommend overwriting system-provided tools with GNU ones. The system ones get updated by Apple. Much better to put GNU tools in a different location, then adjust your $PATH accordingly.
  • Schwern
    Schwern over 11 years
    @Graham Use find -print0 and xargs -0, as in my answer, to get around that.
  • nneonneo
    nneonneo over 11 years
    @Graham: easily amended, use find -print0 and xargs -0.
  • ghoti
    ghoti over 11 years
    @wrick, what was the output you were expecting?
  • pathikrit
    pathikrit over 11 years
    The GNU one makes so much more sense than the BSD interpretation of -m IMO - thanks for catching this
  • Schwern
    Schwern over 11 years
    This solution shares nneonneo's problem, it only works on GNU grep. BSD grep's -m is global, not per file.
  • Schwern
    Schwern over 11 years
    @Graham I suspect you meant "I don't recommend overwriting". MacPorts and Homebrew take care of all that, they live on their own paths and handle the environment adjustments. Violent agreement.
  • oyss
    oyss over 11 years
    @Graham example please.I'm not familiar with this issue. simply test with filenames like test1?.txt still ok.
  • nneonneo
    nneonneo over 11 years
    OK, this is starting to get a bit weird. I consulted a friend, who also uses 10.7, and has GNU grep in /usr/bin/grep. Furthermore, the man page for grep on Apple's site says it's GNU grep. Did Apple suddenly change the default in 10.8?
  • pathikrit
    pathikrit over 11 years
    I expect 1 unique file per line (question has what I expect). Thanks!
  • oyss
    oyss over 11 years
    @Schwern xargs is per file. the grep is executed on each file find matches. there should not be an global -m issue.
  • Graham
    Graham over 11 years
    @oyss - create a file with touch foo.coffee\ bar.coffee. It's a single file, with a space in the filename. Using xargs the way you've suggested, xargs will interpret it as two files. Check the link on my first comment for more details.
  • Schwern
    Schwern over 11 years
    @nneonneo Yep, I'm seeing complaints on the internet about grep being reverted to BSD in 10.8. Reason number 230823 to use MacPorts or Homebrew.
  • pathikrit
    pathikrit over 11 years
    Maybe you have an older Mac? Found this: github.com/schmittjoh/JMSDiExtraBundle/issues/41
  • nneonneo
    nneonneo over 11 years
    @wrick: Yes I do. I have 10.7. So it seems that grep was indeed "downgraded" to BSD grep in 10.8.
  • Schwern
    Schwern over 11 years
    @oyss Tested it with the BSD and GNU greps on my system. I think I see the confusion. xargs does not call the command once for each file, but just once with a list of files. Only if xargs thinks the file list is going to overflow the exec buffer will it do multiple calls. You can test this by writing a program which prints each time it starts and then prints all its arguments.
  • ghoti
    ghoti over 11 years
    Well, I haven't seen your input, so I can't tell whether the output matches. "re" is in "REquires", but it's also in "secREt". Did you try with the updated find line that includes FILENAME?
  • ruakh
    ruakh over 11 years
    @ghoti: Re: "As for piping to head, why would you need to do that here?": The question asks to "control total number of matches", and gives the example of -m2 to limit the total number of matches to two.
  • Graham
    Graham over 11 years
    FreeBSD has been using GNU grep 2.5.1 for years. OSX pre 10.8 and 1.8 both use GNU grep 2.5.1 as well. In FreeBSD 9.0 and OSX 10.8 the behaviour I see with -m 1 is one line per file. @Schwern - please re-check your "confirmed" results, as I can't replicate them.
  • ghoti
    ghoti over 11 years
    Ah, right you are. So the correct answer to the OP's initial question is simply "no".
  • pathikrit
    pathikrit over 11 years
    I confirm schwern's results - for me -m does this: github.com/schmittjoh/JMSDiExtraBundle/issues/41
  • Schwern
    Schwern over 11 years
    @Graham Double check you're running /usr/bin/grep. I can't find anything definitive, but my 10.8.2 machine has BSD grep as /usr/bin/grep and there's a lot of people on the internet confirming.
  • Graham
    Graham over 11 years
    @Schwern - here are my results in FreeBSD 9.0-RELEASE: pastebin.com/RiECy9CE This is the same version of grep reported in OSX 10.8. I don't have access to an OSX 10.8 box just at the moment, but I believe -m1 would be treated the same way on it. Do you see anything wrong with my test?
  • Schwern
    Schwern over 11 years
    @Graham Your test is fine... except its using GNU grep. We know GNU grep works. The only point of contention is what OS X ships with. Prior to 10.8 it was GNU grep. 10.8 introduced BSD grep as confirmed on my machine and all the posts I linked to previously. /usr/bin/grep --version grep (BSD grep) 2.5.1-FreeBSD uname -s -r Darwin 12.2.0. Are you sure you're looking at /usr/bin/grep on your OS X 10.8 machine and you haven't overwritten it?
  • ghoti
    ghoti over 11 years
    I, for one, have never heard of -m acting globally rather than per-file. If this happens in OSX 10.8, it's an Apple-ism, not something to do with the port of GNU grep that is part of FreeBSD. (Note that if there really is such a thing as "BSD grep"; it's not from FreeBSD. FreeBSD still uses a port of GNU grep 2.5.1, as it (and OSX) has for years.)
  • Graham
    Graham over 11 years
    Okay, I can confirm that OSX 10.8.2 behaves differently from FreeBSD. @Schwern, sorry to doubt you, but as ghoti said, this isn't the how grep behaves in any BSD operating system I've seen before; it seems to be unique to OSX.
  • Ross Brasseaux
    Ross Brasseaux over 9 years
    All this time and I finally realize I was asking the wrong question. Thanks!
  • Megan B
    Megan B about 7 years
    This is exactly what I was looking for, and definitely the best answer to this question! Thanks :)
  • ceiling cat
    ceiling cat almost 6 years
    This is the easiest method. For the lazy, the option -l is the abbreviation of --files-with-matches. So you don't need both.
  • Dalker
    Dalker almost 6 years
    this is definitely simpler than the accepted answer
  • Moltres
    Moltres over 5 years
    Man thank you so much for this! Definitely agree with @Dalker