Checking if a file exists in several directories

13,859

Solution 1

You don't mention if you need to keep the files (perhaps removing duplicates?), hardlink them or anything else.

So, depending on your intention, the best solution would be to use one program like rdfind (not interactive), fdupes (more interactive, allowing you to choose which files to keep or not), duff (to only report the files that were duplicate) or many others.

If you want something fancier with a GUI that will let you choose what to keep via a point-and-click interface, then fslint (via its fslint-gui command) would be my recommended choice.

All of the above are available in Debian's repository and, by transition, I think that they are in Ubuntu's or Linux Mint's repositories, if that's what you are using.

Solution 2

This could be very slow if you traverse /downloads or /media for each file name. So traverse each hierarchy only once, store the list of file names, and then process the lists.

For simplicity, I assume that your file names don't contain any newlines.

find /downloads -type f | sed 's!^.*/\(.*\)$!\1/&!' |
  sort -t / -k1,1 >/tmp/downloads.find
find /media/tv /media/music /media/movie -type f |
  sed 's!^.*/\(.*\)$!\1/&!' |
  sort -t / -k1,1 >/tmp/media.find

At this point, the two .find files contain lists of file paths, with the name of the file prepended, sorted by file name. Join the files on the first /-separated field, and clean up the result a bit.

join -j 1 -t / /tmp/downloads.find /tmp/media.find |
  sed -e 's![^/]*/!!' -e 's![^/]*/! has the same name as !'

Solution 3

This will list all files in downloads that are also in your specified /media subdirectories:

find /downloads -type f | while IFS= read -r file ; do
    bn=$(basename "$file")
    find /media/tv /media/movie /media/music -type f -name "$bn"
done

and this will just print whether the file has been found in one of those /media sub-directories or not.

find /downloads -type f | while IFS= read -r file ; do
    bn=$(basename "$file")

    count=$(find /media/tv /media/movie /media/music -type f -name "$bn" | wc -l)

    [ "$count" -gt 0 ] && printf "found %s\n" "$f"
done

If there are many files in /downloads, running find once for each file will be very slow. That can be solved (if you are using GNU find) by building a regular expression containing all the filenames you want to search for and using GNU find's -regex or -iregex options.

REGEXP="^.*/\("
find /downloads -type f | while IFS= read -r file ; do
    bn=$(basename "$file" | sed -e 's/\./\\./g')
    REGEXP="$REGEXP\|$bn"
done
REGEXP="$REGEXP\)$"

find /media/tv /media/movie /media/music -type f -iregex "$REGEXP"

And here's another version that doesn't use the shell built-in read so should be much faster:

REGEXP=$(find /downloads -type f | sed -e 's/^.*\/// ; s/\([]*\ .|[]\)/\\\1/g ; 
    s/$/\\|/' | tr -d '\n')
find /media/tv /media/movie /media/music -type f -iregex "^.*\($REGEXP\)$"

Both of these regexp versions are limited by the maximum line length of a shell command - too many files and they will fail.


NOTE: like most other answers here, these examples do not cope with filenames that have newlines (\n) in them. Any other character, including space, is fine.

Solution 4

Here is an implementation in bash using brace expansion:

the_file=foo.mp3
for file in /downloads/media/{tv,movie,music}/"$the_file"; do 
   if [[ -e $file ]]; then
      printf '%s found in %s:\n' "$the_file" "${file%/*}"
   fi
done
Share:
13,859
andrew.vh
Author by

andrew.vh

Updated on September 18, 2022

Comments

  • andrew.vh
    andrew.vh almost 2 years

    I need a script that will look at files in a directory and see if it exists in one of several directories.

    I need something like this:

    for files in /downloads/ #may or may not be in a sub-directory
    do
       print if file exists in /media/tv, /media/movie, or /media/music
    done
    

    the files will not be in the root of the directory. I can't just search /media, because I don't want to search in cd-rom or videos.

    I am using the latest version of Ubuntu server.

  • Bernhard
    Bernhard over 11 years
    As I understand it, /media/ is not a subfolder of /downloads/
  • user1106106
    user1106106 over 11 years
    The original poster didn't mention it, but your method essentially only looks at the file names, not at the file contents. Which is the original poster's intention is not clear, though.
  • andrew.vh
    andrew.vh over 11 years
    also I didnt realize you could compare files based on contents. that would be nice as sometimes I would like to rename the files.
  • andrew.vh
    andrew.vh over 11 years
    The goal is to find files that i have downloaded, but have not yet copied to my media directory. Although finding duplicates will be a nice way to clean up my collection, its not what I'm looking to do in this script.
  • user1106106
    user1106106 over 11 years
    @andrew.vh, any of the utilities above would catch files with the same names, as they, more generally, will look for the contents of the files, not just their names, as you already noticed.
  • IronSummitMedia
    IronSummitMedia over 11 years
    This is the join command that worked for me: join -j 1 -t / /tmp/downloads.find /tmp/media.find | sed -e 's![^/]*/!!' -e 's!//! has the same name as /!' (tested on two computers running SLES 10 and 11.)
  • vonbrand
    vonbrand about 11 years
    That only checks in the current directory...