How to exclude a list of full directory paths in find command on Solaris

9,453

Solution 1

You can't match files by full path with Solaris find, but you can match files by inode. So use ls -i to generate a list of inodes to prune, then call find. This assumes that there aren't so many directories you want to prune that you'd go over the command line length limit.

inode_matches=$(ls -bdi /opt/dir1 /opt/dir2 /var/dir3/dir4 |
                sed -e 's/ *\([0-9][0-9]*\) .*/-inum \1 -o/')
find / -xdev \( $inode_matches -nouser -o -nogroup \) -prune -o -print

An alternative approach would be to use a Perl or Python script and roll your own directory traversal. Perl ships with a find2perl script that can get you started with the File::Find module. In Python, see the walk function in the os.path module.

Solution 2

Since the implementation(s) of find do not support -path test, you can simulate it using -exec test "{}" = "/path/to/exclude" \; -prune (the {} should be expanded to full path name).

This, unfortunately, will take more time than "pure" find, since the test program will get executed in each run. So make sure to optimize the tests as much as you can - for example, check which of these two runs faster:

 -exec test "{}" = "/dev" \; -o -exec test "{}" = "/proc" \; -o -exec test "{}" = "/tmp/test" \;

or

 -exec test "{}" = "/dev" -o "{}" = "/proc" -o "{}" = "/tmp/test" \;

I think the latter should be faster overall, because the test program is executed only once.

Note: You don't need the -a's for and-logic; this is implied by default. Same goes for -print at the end.

Share:
9,453

Related videos on Youtube

Yanick Girouard
Author by

Yanick Girouard

Updated on September 18, 2022

Comments

  • Yanick Girouard
    Yanick Girouard almost 2 years

    (Duplicated from Stack Overflow: https://stackoverflow.com/questions/7854975/how-to-exclude-a-list-of-full-directory-paths-in-find-command-on-solaris)

    I have a very specific need to find unowned files and directories in Solaris using a script, and need to be able to exclude full directory paths from the find because they contain potentially thousands of unowned files (and it's normal because they are files hosted on other servers). I don't even want find to search in those directories as it will hang the server (cpu spiking to 99% for a long time), therefore piping the find results in egrep to filter out those directories is not an option.

    I know I can do this to exclude one of more directories by name:

    find / -mount -local \( -type d -a \( -name dir1 -o -name dir2 -o dir3 \) \) -prune -o \( -nouser -o -nogroup \) -print

    However, this will match dir1 and dir2 anywhere in the directory structure of any directories, which is not what I want at all.

    I want to be able to prevent find from even searching in the following directories (as an example):

    /opt/dir1
    /opt/dir2
    /var/dir3/dir4
    

    And I still want it to find unowned files and directories in the following directories:

    /opt/somedir/dir1
    /var/dir2
    /home/user1/dir1
    

    I have tried using regex in the -name arguments, but since find only matches 'name' against the basename of what it finds, I can't specify a path. Unfortunately, Solaris's find does not support GNU find options such as -wholename or -path, so I'm kind of screwed.

    My goal would be to have a script with the following syntax:

    script.sh "/path/to/dir1,/path/to/dir2,/path/to/dir3"

    How could I do that using find and standard sh scripting (/bin/sh) on Solaris (5.8 and up)?

    • sr_
      sr_ over 12 years
      Just by the way, you could easily install GNU findutils, it's in pkgsrc which supports Solaris - since it would save you some work.
    • Yanick Girouard
      Yanick Girouard over 12 years
      I know, but the servers we support are not ours, and we can't install anything without the approval of our clients, and some would refuse changing core binaries such as find. We have hundreds of servers that have Solaris, and none is configured exactly the same. That's why we need to use only POSIX binaries.
    • rozcietrzewiacz
      rozcietrzewiacz over 12 years
      Since the implementation(s) of find do not support -path test, you can simulate it using -exec test "{}" = "/path/to/exclude" \; -prune. The {} should be expanded to full path name.
    • Yanick Girouard
      Yanick Girouard over 12 years
      Humm... I tested it using this command on RHEL 5.6 (I don't have access to Solaris from home): find / -mount \( -type d -a \( -exec test "{}" = /dev \; -o -exec test "{}" = "/proc" \; -o -exec test "{}" = "/tmp/test" \; \) \) -prune -o \( -nouser -o -nogroup \) -print It worked, but it took a lot more time to run. I was monitoring CPU usage while the find was running and it was not going over 27%, which is not so bad, but I only have 3 tests in my condition. I wonder how bad it would get when there's more...
    • Yanick Girouard
      Yanick Girouard over 12 years
      @rozcietrzewiacz: You should post your comment as an answer so I can vote on it if it turns out this is the only option. That way you'll get some rep! You deserve it!
    • maxschlepzig
      maxschlepzig over 12 years
      Isn't it possible to migrate a post from SO? Just duplicating it seems a bit annoying ...
  • Yanick Girouard
    Yanick Girouard over 12 years
    Thanks for the details! I will test it out and try to optimize the test condition as best as I can to compensate. I will accept the answer since there's really no better alternative short of installing GNU findutils on every Solaris server we support (which is out of the question). I will look into a Perl alternative too, but I'm not sure which will be the fastest...
  • Yanick Girouard
    Yanick Girouard over 12 years
    Could you do me a favor and post the same answer on the Stack Overflow version of this post as well? The link is on top. Would be much appreciated!
  • Yanick Girouard
    Yanick Girouard over 12 years
    Tested both variants of the exec condition, here are the results: using multiple -o in the same test = much faster, but 25% more CPU usage in average (eye-balled it using top). Using multiple tests = less CPU usage, but MUCH longer to run. Seems either way, there's a performance hit compared to the built-in find operations.
  • rozcietrzewiacz
    rozcietrzewiacz over 12 years
    Such increased CPU usage is seems not so bad if it runs faster - the machine does more analyzing instead of I/O. This might be even beneficial to responsiveness. To make sure your script does not interfere too much with the rest of system, run it using nice.
  • Yanick Girouard
    Yanick Girouard over 12 years
    Just so you're on the loop too, my other post on Stack Overflow got a better answer. Using inodes with ls -bdi, and then -num in the find. It's POSIX compliant and will work on old systems. This is much faster than using -exec test... It's nice to know both options however! Could come in handy in the future.
  • rozcietrzewiacz
    rozcietrzewiacz over 12 years
    +1 Ahh... Better. As always... ;)
  • Yanick Girouard
    Yanick Girouard over 12 years
    ..sorry, I meant -inum, not -num
  • Yanick Girouard
    Yanick Girouard over 12 years
    Don't have enough rep to upvote yet, or I would have... sorry.