Passing multiple directories to the -prune option in find

13,199

Solution 1

As far as I know, there is no option to tell find to read patterns from a file. An easy workaround is to save the patterns I want to exclude in a file and pass that file as input for a reverse grep. As an example, I have created the following files and directories:

$ tree -a
.
├── a
├── .aa
├── .aa.bak
├── a.bck
├── b
├── .dir1
│   └── bb1.bak
├── dir2
│   └── bb2.bak
├── b.bak
├── c
├── c~
├── Documents
│   └── Documents.bak
├── exclude.txt
├── foo.backup
└── Music
    └── Music.bak

If I understood the example you posted correctly, you want to move a.bck, .aa.bak, b.bak, c~, foo.backup and dir2/bb2.bak to the trash and leave .aa.bak, .dir1/bb1.bak, Documents/Documents.bak and Music/Music.bak where they are. I have, therefore, created the file exclude.txt with the following contents (you can add as many as you want):

$ cat exclude.txt 
./.*/
./Music
./Documents

I use ./.*/ because I understood your original find to mean that you want to move hidden backup files (.foo) that are in the current directory but exclude any backup files that are in hidden directories (.foo/bar). So, I can now run the find command and use grep to exclude unwanted files:

$ find . -type f | grep -vZf exclude.txt | xargs -0 --no-run-if-empty trash-put

Grep options:

   -v, --invert-match
          Invert  the  sense  of matching, to select non-matching
          lines.  (-v is specified by POSIX.)
   -f FILE, --file=FILE
          Obtain patterns from FILE, one  per  line.   The  empty
          file  contains  zero  patterns,  and  therefore matches
          nothing.  (-f is specified by POSIX.)
   -Z, --null
          Output a zero byte (the ASCII NUL character) instead of
          the  character  that normally follows a file name.  For
          example, grep -lZ outputs a zero byte after  each  file
          name  instead  of the usual newline.  This option makes
          the output unambiguous, even in the  presence  of  file
          names  containing  unusual  characters  like  newlines.
          This  option  can  be  used  with  commands  like  find
          -print0,  perl  -0,  sort  -z,  and xargs -0 to process
          arbitrary file names, even those that  contain  newline
          characters.

Solution 2

With GNU find (i.e. under non-embedded Linux or Cygwin), you can use -regex to combine all these -path wildcards into a single regex.

find . -regextype posix-extended \
     -type d -regex '\./(\..*|Music|Documents)' -prune -o \
     -type f -regex '.*(\.(bck|bak|backup)|~)' -print0 |
xargs -0 --no-run-if-empty trash-put

On BSDs or macOS, use -E instead of -regextype posix-extended.

You may also want to replace that -print0 | xargs -0 --no-run-if-empty trash-put (where --no-run-if-empty is GNU-specific1¹) with its standard shorter equivalent: -exec trash-put {} +.


¹ its shorter -r form is however supported by some other implementations, even the default on some BSDs

Solution 3

Pass multiple -path ... -prune using -o (or) logic grouped together by parentheses \( ... \)

find /somepath \( -path /a -prune -o \
                  -path /b -prune -o \
                  -path /c -prune \
               \) \
               -o -print

The example will not iterate directories or files at or under /somepath/a, /somepath/b, and /somepath/c.


Here is a contrived example using multiple expressions and a complex -exec action. This prints the file path and checksum for plain files on a Linux host, pruning paths with mostly transient files or character devices

$ find / \( -path /dev -prune -o \
            -path /proc -prune -o \
            -path /sys -prune \
         \) \
         -o -type f \
            -printf '%p ' -exec sh -c 'md5sum -- "{}" | cut -f1 -d" "' \;
/etc/services 00060e37207f950bf0ebfd25810c19b9
/etc/lsb-release f317530ede1f20079f73063065c1684e
/etc/protocols bb9c019d6524e913fd72441d58b68216
/etc/rsyslog.conf 8f03326e3d7284ef50ac6777ef8a4fb8
...

Solution 4

This seems to be more a shell question than a find question. With a file containing ( -name dir1 -o -name dir2 ) -prune (no "\"!) you can simply do this:

find ... $(< /path/to/file)

Without changing the find call itself (to eval find or by changing $IFS) this works with paths without whitespace only, though.

If you want to keep the file simpler you can write a script.

# file content
dir1
dir2
dir3

# script content
#!/bin/bash
file=/path/to/file
# file may be checked for whitespace here
grep '[^[:space:]]' "$file" | { empty=yes
  while read dir; do
    if [ yes = "$empty" ]; then
      echo -n "( "
      empty=no
    else
      echo -n " -o "
    fi
    echo -n "-name ${dir}"
  done
  if [ no = "$empty" ]; then
    echo -n " ) -prune"
  fi; }

And use

find ... $(/path/to/script)

instead.

Share:
13,199

Related videos on Youtube

chandra
Author by

chandra

iSTAR: Independent scholar, thinker, author, and researcher. Retired academic and biomedical engineer, interested in computing, mathematics, language, and the art of learning.

Updated on September 18, 2022

Comments

  • chandra
    chandra almost 2 years

    I am using find to locate and delete backup files but wish to exclude certain directories from the search. The backup filenames could terminate in .bck, bak, ~, or backup.

    The Minimal Working Example (MWE) code with only three directories to exclude is:

    #! /bin/bash
    find . -type d \( -path "./.*" -o -path "./Music" -o -path "./Documents" \) -prune -o -type f \( -name "*.bck" -o -name "*.bak" -o -name "*~" -o -name "*.backup" \) -print0 | xargs -0 --no-run-if-empty trash-put
    

    The syntax \( -path "dir1" -o -path "dir2" ... -o -path "dirN" \) -prune seems a little clunky, especially if there are around ten directories to exclude, although I have shown only three in the MWE.

    Is there a more elegant way using either an input file, with the list of excluded directories, or an array- or list-like construct, that could be pressed into service?

    I am sorry for not being more explicit when I wrote my original question.

    NB: trash-put is a utility that moves the files to the Trashcan instead of deleting them [1].

    [1]. https://github.com/andreafrancia/trash-cli

  • chandra
    chandra about 11 years
    I am so sorry for not being explicit. Kindly see revised question which I hope is clearer.
  • chandra
    chandra about 11 years
    I am so sorry for not being explicit. Kindly see revised question which I hope is clearer.
  • Hauke Laging
    Hauke Laging about 11 years
    @chandra I neither see how your question is clearer nor do I understand what could be a problem with my solution (except for the trivial replecement of -name by path).
  • chandra
    chandra about 11 years
    My script above works and does what I want it to. I simply wanted to know whether there was a neater way than \( -path "dir1" -o -path "dir2" ... -o -path "dirN" \) -prune to exclude certain directories from the recursive search that find does. I am not searching for anything within files but rather deleting certain files and avoiding certain directories in my search path. I do not understand what your script is trying to do either. So, it appears we have a miscommunication. Sorry. Let us leave it at that.
  • terdon
    terdon about 11 years
    @chandra see updated answer, same general idea, different details.
  • chandra
    chandra about 11 years
    Thank you. You have answered my question very clearly and perfectly for my purpose. I have accepted your answer.
  • chandra
    chandra about 11 years
    Thank you for an excellent alternative answer. It is a shame that I cannot accept two answers.
  • they
    they over 2 years
    In the specific example that you show at the end, it would probably be easier to just use -xdev to keep find from wandering into other mounted filesystems.
  • Admin
    Admin about 2 years
    To exclude every node_modules folders, even in sub-folders, I had to use -o -path '*/node_modules*' -prune instead