How to recursively find a .doc file that contains a specific word?

10,459

Solution 1

Use find for recursive searches:

find -name '*.doc' -exec catdoc {} + | grep "specificword"

This will also output the file name:

find -name '*.doc' | while read -r file; do
    catdoc "$file" | grep -H --label="$file" "specificword"
done

(Normally I would use find ... -print0 | while read -rd "" file, but there's maybe a .0001% chance that it would be necessary, so I stopped caring.)

Solution 2

You might want to look at recoll which is a full-text search tool for Linux and Unix systems supporting many different document formats. However, it is index-based, i.e., it has to index the documents you want to search in before the actual search. (Thanks to pabouk for pointing this out).

There is a GUI and a command line, too.

See the documentation for further infos.

Solution 3

Grep should find binary matches with:

find /path/to/dir -name '*.doc' exec grep -l "specificword" {} \;
Share:
10,459

Related videos on Youtube

Tom
Author by

Tom

Updated on September 18, 2022

Comments

  • Tom
    Tom over 1 year

    I'm using bash under Ubuntu.

    Currently this works well for the current directory:

    catdoc *.doc | grep "specificword" 
    

    But I have lots of subdirectories with .doc files.

    How can I search for, let's say, "specificword" recursively?

  • Tom
    Tom over 12 years
    Thanks grawity, the first suggestion works quit well. Is there a way to print the file name? it's only printing the phrase in which it has been found.
  • user1686
    user1686 over 12 years
    @user: Try the second suggestion, which, by the way, is titled "This will also output the file name".
  • glenn jackman
    glenn jackman over 12 years
    can probably simplify a bit: find -name \*.doc -exec sh -c "catdoc '{}' | grep -q 'specificword' && echo {}" \;
  • pabouk - Ukraine stay strong
    pabouk - Ukraine stay strong over 10 years
    Maybe it is worth to note that Recoll provides indexed search. First it has to index the documents then it can search through the indexes.