Recursively search a pattern/text only in the specified file name of a directory?

10,870

Solution 1

In the parent directory, you could use find and then run grep on only those files:

find . -type f -iname "file.txt" -exec grep -Hi "pattern" '{}' +

Solution 2

You could also use globstar.

Building grep commands with find, as in Zanna's answer, is a highly robust, versatile, and portable way to do this (see also sudodus's answer). And muru has posted an excellent approach of using grep's --include option. But if you want to use just the grep command and your shell, there is another way to do it -- you can make the shell itself perform the necessary recursion:

shopt -s globstar   # you can skip this if you already have globstar turned on
grep -H 'pattern' **/file.txt

The -H flag makes grep show the filename even if only one matching file is found. You can pass the -a, -i, and -n flags (from your example) to grep as well, if that's what you need. But don't pass -r or -R when using this method. It is the shell that recurses directories in expanding the glob pattern containing **, and not grep.

These instructions are specific to the Bash shell. Bash is the default user shell in Ubuntu (and most other GNU/Linux operating systems), so if you're on Ubuntu and don't know what your shell is, it's almost certainly Bash. Although popular shells usually support directory-traversing ** globs, they don't always work the same way. For more information, see Stéphane Chazelas's excellent answer to The result of ls * , ls ** and ls *** on Unix.SE.

How It Works

Turning on the globstar bash shell option makes ** match paths containing the directory separator (/). It is thus a directory-recursing glob. Specifically, as man bash explains:

When the globstar shell option is enabled, and * is used in a pathname expansion context, two adjacent *s used as a single pattern will match all files and zero or more directories and subdirectories. If followed by a /, two adjacent *s will match only directories and subdirectories.

You should be careful with this, since you can run commands that modify or delete far more files than you intend, especially if you write ** when you meant to write *. (It's safe in this command, which doesn't change any iles.) shopt -u globstar turns the globstar shell option back off.

There are a few practical differences between globstar and find.

find is far more versatile than globstar. Anything you can do with globstar, you can do with the find command too. I like globstar, and sometimes it's more convenient, but globstar is not a general alternative to find.

The method above does not look inside directories whose names start with a .. Sometimes you don't want to recurse such folders, but sometimes you do.

As with an ordinary glob, the shell builds a list of all matching paths and passes them as arguments to your command (grep) in place of the glob itself. If you have so many files called file.txt that the resulting command would be too long for the system to execute, then the method above will fail. In practice you'd need (at least) thousands of such files, but it could happen.

The methods that use find are not subject to this restriction, because:

  • Zanna's way builds and runs a grep command with potentially many path arguments. But if more files are found than can be listed in a single path, the +-terminated -exec action runs the command with some of the paths, then runs it again with some more paths, and so forth. In the case of greping for a string in multiple files, this produces the correct behavior.

    Like the globstar method covered here, this prints all matching lines, with paths prepended to each.

  • sudodus's way runs grep separately for each file.txt found. If there are many files, it might be slower than some other methods, but it works.

    That method finds files and prints their paths, followed by matching lines if any. This is a different output format from the format produced by my method, Zanna's, and muru's.

Getting color with find

One of the immediate benefits of using globstar is, by default on Ubuntu, grep will produce colorized output. But you can easily get this with find, too.

User accounts in Ubuntu are created with an alias that makes grep really run grep --color=auto (run alias grep to see). It's a good thing that aliases are pretty much only expanded when you issue them interactively, but it means that if you want find to invoke grep with the --color flag, you'll have to write it explicitly. For example:

find . -name file.txt -exec grep --color=auto -H 'pattern' {} +

Solution 3

You don't need find for this; grep can handle this perfectly fine on its own:

grep "pattern" . -airn --include="file.txt"

From man grep:

--exclude=GLOB
      Skip  files  whose  base  name  matches  GLOB  (using   wildcard
      matching).   A  file-name  glob  can  use  *,  ?,  and [...]  as
      wildcards, and \ to quote  a  wildcard  or  backslash  character
      literally.

--exclude-from=FILE
      Skip  files  whose  base name matches any of the file-name globs
      read from FILE  (using  wildcard  matching  as  described  under
      --exclude).

--exclude-dir=DIR
      Exclude  directories  matching  the  pattern  DIR from recursive
      searches.

--include=GLOB
      Search  only  files whose base name matches GLOB (using wildcard
      matching as described under --exclude).

Solution 4

The method given in muru's answer, of running grep with the --include flag to specify a filename, is often the best choice. However, this can also be done with find.

The approach in this answer uses find to run grep separately for each file found, and prints the path to each file exactly once, above the matching lines found in each file. (Methods that print the path in front of every matching line are covered in other answers.)


You can change directory to the top of the directory tree where you have those files. Then run:

find . -name "file.txt" -type f -exec echo "##### {}:" \; -exec grep -i "pattern" {} \;

That prints the path (relative to the current directory, ., and including the filename itself) of each file named file.txt, followed by all matching lines in the file. This works because {} is a placeholder for the file found. Each file's path is set apart from its contents by being prefixed with #####, and is printed only once, before the matching lines from that file. (Files called file.txt that contain no matches still have their paths printed.) You might find this output less cluttered than what you get from methods that print a path at the beginning of every matching line.

Using find like this will almost always be faster than running grep on every file (grep -arin "pattern" *), because find searches for the files with the correct name and skips all other files.

Ubuntu uses GNU find, which always expands {} even when it appears in a larger string, like ##### {}:. If you need your command to work with find on systems that might not support this, or you prefer to use the -exec action only when absolutely necessary, you can use:

find . -name "file.txt" -type f -printf '##### %p:\n' -exec grep -i "pattern" {} \;

To make the output easier to read, you can use ANSI escape sequences to get coloured file names. This makes each file's path heading stand out better from the matching lines that get printed under it:

find . -name file.txt -printf $'\e[32m%p:\e[0m\n' -exec grep -i "pattern" {} \;

That causes your shell to turn the escape code for green into the actual escape sequence that produces green in a terminal, and to do the same thing with the escape code for normal colour. These escapes are passed to find, which uses them when it prints a filename. ($' ' quotation is necessary here because find's -printf action doesn't recognize \e for interpreting ANSI escape codes.)

If you prefer, you could instead use -exec with the system's printf command (which does support \e). So another way to do the same thing is:

find . -name file.txt -exec printf '\e[32m%s:\e[0m\n' {} \; -exec grep -i "pattern" {} \;
Share:
10,870

Related videos on Youtube

Rajesh Keladimath
Author by

Rajesh Keladimath

Updated on September 18, 2022

Comments

  • Rajesh Keladimath
    Rajesh Keladimath over 1 year

    I have a directory (e.g., abc/def/efg) with many sub-directories (e.g.,: abc/def/efg/(1..300)). All of these sub-directories have a common file (e.g., file.txt). I want to search a string only in this file.txt excluding other files. How can I do this?

    I used grep -arin "pattern" *, but it is very slow if we have many sub-directories and files.

  • Eliah Kagan
    Eliah Kagan over 7 years
    I suggest also passing -H to grep so that, in cases when only one path is passed to it, that path is still printed (rather than just the matching lines from the file).
  • kcdtv
    kcdtv over 7 years
    i was going to make a "for loop" with an array and I didn't think about exec native option from find. Good one! But I think that using dot will locate you in the directory where you already are. Correct me if I am wrong. Wouldn't it be better to specify the directly to parse in the find order? find abc/def/efg -name "file.txt" -type f -exec echo -e "##### {}:" \; -exec grep -i "pattern" {} \;
  • sudodus
    sudodus over 7 years
    Sure, that will eliminate the cd abc/def/efg 'change directory' command :-)
  • Eliah Kagan
    Eliah Kagan over 7 years
    Nice--this seems like the best way. Simple and efficient. I wish I had known about (or thought to check the manpage for) this method. Thanks!
  • muru
    muru over 7 years
    @EliahKagan I'm more surprised Zanna didn't post this - I had shown an example of this option for another answer some time ago. :)
  • Zanna
    Zanna over 7 years
    slow learner, alas, but I get there eventually, your teachings aren't completely wasted on me ;)
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 7 years
    (1) Why are you specifying the -e option to echo?  That will cause it to mangle any filenames that contain backslashes. (2) Using {} as part of an argument is not guaranteed to work.  It would be better to say -exec echo "#####" {} \; or -exec printf "##### %s:\n" {} \;. (3) Why not just use -print or -printf? (4) Consider also grep -H.
  • sudodus
    sudodus over 7 years
    @ G-man, 1)Because I used ANSI colour originally: find . -name "file.txt" -type f -exec echo -e "\0033[32m{}:\0033[0m" \; -exec grep -i "pattern" {} \; 2) You may be right, but so far this is working for me. 3) -print and -printf are also alternatives. 4) This is already there in the main answer. - Anyway, you are welcome with your own answer :-)
  • Rajesh Keladimath
    Rajesh Keladimath over 7 years
    This is very simple and easy to remember. Thank You.
  • sudodus
    sudodus over 7 years
    I agree, that this is the best answer. Should I remove my answer to decrease confusion, or let it stay to show that there are alternatives, and what can be done with find?
  • muru
    muru over 7 years
    @sudodus I don't see any reason for deleting your answer - it's not wrong or harmful or anything bad. It is informative, so keep it.
  • terdon
    terdon over 7 years
    You don't need the two -exec calls. Just use grep -H and that will print the file name (in color) as well as the matched text.
  • sudodus
    sudodus over 7 years
    I know, read the first four lines of my answer (the 'Edit' section)!
  • Stig Hemmer
    Stig Hemmer over 7 years
    You might want to state more clearly that you need to be using the bashshell for this to work. You do say it implicitly in "the globstar bash shell option" but it can be easily missed by people reading too quickly.
  • sudodus
    sudodus over 7 years
    I removed my answer because it caused a lot of critical comments. So you should remove the reference to it in your answer.
  • Eliah Kagan
    Eliah Kagan over 7 years
    @StigHemmer Thanks -- I've clarified that not all shells have this feature. Although many shells (not just bash) do support directory-traversing ** globs, your core critique is correct: the presentation of ** in this answer is specific to bash, with shopt being bash only and the term "globstar" being (I think) bash and tcsh only. I'd glossed over this originally because of those complexities, but you're right that it's somewhat confusing. Rather than discuss it at length in this answer, I've linked to another (quite thorough) post that does the heavy lifting.
  • Eliah Kagan
    Eliah Kagan over 7 years
    @sudodus I've done so, but I hope this is temporary. I, and others, have found your answer valuable. It's true -e shouldn't be applied to paths, but this is easily fixed. For the first command, just omit -e. For the second, use find . -name file.txt -printf $'\e[32m%p:\e[0m\n' -exec grep -i "pattern" {} \; or find . -name file.txt -exec printf '\e[32m%s:\e[0m\n' {} \; -exec grep -i "pattern" {} \;. Users will sometimes prefer your way (with -e usage fixed) to the others, which print one path per matching line; yours prints one path per file found followed by grep results.
  • Eliah Kagan
    Eliah Kagan over 7 years
    @sudodus So grep itself won't do what you're doing. Some other criticisms were wrong too. grep -H run by -exec won't colorize without --color (or GREP_COLOR). IEEE 1003.1-2008 doesn't guarantee {} expands in ##### {}:, but Ubuntu has GNU find, which does. If it's OK with you I'll edit your post to fix the -e bug (and clarify its use case) and you can see if you want to undelete. (I have the rep to view/edit deleted posts.)
  • sudodus
    sudodus over 7 years
    OK, go ahead :-)
  • Eliah Kagan
    Eliah Kagan over 7 years
    @sudodus I've finally edited your answer. Since the post will remain deleted until you choose to undelete it, I went ahead and made significant changes, with the hope of showing the method really is valuable and distinct from others. You should definitely feel free to apply your own edits if you want to say all or part of it differently, take anything out, put more in, etc. (You can also, if you prefer, view my edit as just an example of a possible edit, roll it back, and start anew. My efforts still won't have been wasted. And everything can still be retrieved from the post's edit history.)
  • Eliah Kagan
    Eliah Kagan over 7 years
    @sudodus No problem! I'm glad this is back -- it's good to have an answer that covers this method.
  • Zanna
    Zanna over 7 years
    Looks great now :D
  • muru
    muru over 7 years
    @EliahKagan woah, thanks! But why bounty this post? It was getting decent attention. :) Now I have to find another post to bounty on.
  • user867560
    user867560 over 6 years
    @ muru: very cool! +1
  • rsmets
    rsmets over 3 years
    Generalized as the shell function: findAndGrep() { find . -type f -iname "$1" -exec grep -Hi "$2" '{}' + }