How do I perform a recursive directory search for strings within files in a UNIX TRU64 environment?

15,259

Solution 1

This should do it:

find dir -type f -exec grep -F -f strings.txt {} \;

dir is the directory from which searching will commence

strings.txt is the file of strings to match, one per line

-F means treat search strings as literal rather than regular expressions

-f strings.txt means use the strings in strings.txt for matching

You can add -l to the grep switches if you just want filenames that match.

Footnote:

Some people prefer a solution involving xargs, e.g.

find dir -type f -print0 | xargs -0 grep -F -f strings.txt

which is perhaps a little more robust/efficient in some cases.

Solution 2

By reading, I assume we can not use the gnu coreutil, and egrep is not available. I assume (for some reason) the system is broken, and escapes do not work as expected.

Under normal situations, grep -rf patternfile.txt /some/dir/ is the way to go.

a file containing a list of all the strings to be searched

Assumptions : gnu coreutil not available. grep -r does not work. handling of special character is broken.

Now, you have working awk ? no ?. It makes life so much easier. But lets be on the safe side.

Assume : working sed ,one of od OR hexdump OR xxd (from vim package) is available.

Lets call this patternfile.txt


1. Convert list into a regexp that grep likes

Example patternfile.txt contains

/foo/

/bar/doe/

/root/

(example does not print special char, but it's there.) we must turn it into something like

(/foo/|/bar/doe/|/root/)

Assuming echo -en command is not broken, and xxd , or od, or hexdump is available,

Using hexdump

cat patternfile.txt |hexdump -ve '1/1 "%02x \n"' |tr -d '\n'

Using od

cat patternfile.txt |od -A none -t x1|tr -d '\n'

and pipe it into (common for both hexdump and od) |sed 's:[ ]*0a[ ]*$::g'|sed 's: 0a:\\|:g' |sed 's:^[ ]*::g'|sed 's:^: :g' |sed 's: :\\x:g' then pipe result into |sed 's:^:\\(:g' |sed 's:$:\\):g' and you have a regexp pattern that is escaped.


2. Feed the escaped pattern into broken regexp

Assuming the bare minimum shell escape is available, we use grep "$(echo -en "ESCAPED_PATTERN" )" to do our job.


3. To sum it up

Building a escaped regexp pattern (using hexdump as example )

grep "$(echo -en "$( cat patternfile.txt |hexdump -ve '1/1 "%02x \n"' |tr -d '\n' |sed 's:[ ]*0a[ ]*$::g'|sed 's: 0a:\\|:g' |sed 's:^[ ]*::g'|sed 's:^: :g' |sed 's: :\\x:g'|sed 's:^:\\(:g' |sed 's:$:\\):g')")"

will escape all characters and enclose it with (|) brackets so a regexp OR match will be performed.

4. Recrusive directory lookup

Under normal situations, even when grep -r is broken, find /dir/ -exec grep {} \; should work. Some may prefer xargs instaed (unless you happen to have buggy xargs). We prefer find /somedir/ -type f -print0 |xargs -0 grep -f 'patternfile.txt' approach, but since this is not available (for whatever valid reason), we need to exec grep for each file,and this is normaly the wrong way. But lets do it.

Assume : find -type f works. Assume : xargs is broken OR not available.

First, if you have a buggy pipe, it might not handle large number of files. So we avoid xargs in such systems (i know, i know, just lets pretend it is broken ).

find /whatever/dir/to/start/looking/ -type f > list-of-all-file-to-search-for.txt

IF your shell handles large size lists nicely, for file in cat list-of-all-file-to-search-for.txt ; do grep REGEXP_PATTERN "$file" ; done ; is a nice way to get by. Unfortunetly, some systems do not like that, and in that case, you may require cat list-of-all-file-to-search-for.txt | split --help -a 4 -d -l 2000 file-smaller-chunk.part. to turn it into smaller chunks. Now this is for a seriously broken system. then a for file in file-smaller-chunk.part.* ; do for single_line in cat "$file" ; do grep REGEXP_PATTERN "$single_line" ; done ; done ; should work.

A cat filelist.txt |while read file ; do grep REGEXP_PATTERN $file ; done ; may be used as workaround on some systems.

What if my shell doe not handle quotes ?

You may have to escape the file list beforehand.

It can be done much nicer in awk, perl, whatever, but since we restrict our selves to sed, lets do it. We assume 0x27, the ' code will actually work. cat list-of-all-file-to-search-for.txt |sed 's@['\'']@'\''\\'\'\''@g'|sed 's:^:'\'':g'|sed 's:$:'\'':g' The only time I had to use this was when feeding output into bash again.

What if my shell does not handle that ?

xargs fails , grep -r fails , shell's for loop fails.

Do we have other things ? YES.

Escape all input suitable for your shell, and make a script.

But you know what, I got board, and writing automated scripts for csh just seems wrong. So I am going to stop here.

Take home note

Use the tool for the right job. Writing a interpreter on bc is perfectly capable, but it is just plain wrong. Install coreutils, perl, a better grep what ever. makes life a better thing.

Share:
15,259
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin over 1 year

    Unfortunately, due to the limitations of our Unix Tru64 environment, I am unable to use the GREP -r switch to perform my search for strings within files across multiple directories and sub directories.

    Ideally, I would like to pass two parameters. The first will be the directory I want my search is to start on. The second is a file containing a list of all the strings to be searched. This list will consist of various directory path names and will include special characters:

    ie:
    /aaa/bbb/ccc
    /eee/dddd/ggggggg/
    etc..

    The purpose of this exercise is to identify all shell scripts that may have specific hard coded path names identified in my list.

    There was one example I found during my investigations that perhaps comes close, but I am not sure how to customize this to accept a file of string arguments:

    eg: find etb -exec grep test {} \;

    where 'etb' is the directory and 'test', a hard coded string to be searched.