Search for duplicate file names within folder hierarchy?
Solution 1
FSlint is a versatile duplicate finder that includes a function for finding duplicate names:
The FSlint package for Ubuntu emphasizes the graphical interface, but as is explained in the FSlint FAQ a command-line interface is available via the programs in /usr/share/fslint/fslint/
. Use the --help
option for documentation, e.g.:
$ /usr/share/fslint/fslint/fslint --help
File system lint.
A collection of utilities to find lint on a filesystem.
To get more info on each utility run 'util --help'.
findup -- find DUPlicate files
findnl -- find Name Lint (problems with filenames)
findu8 -- find filenames with invalid utf8 encoding
findbl -- find Bad Links (various problems with symlinks)
findsn -- find Same Name (problems with clashing names)
finded -- find Empty Directories
findid -- find files with dead user IDs
findns -- find Non Stripped executables
findrs -- find Redundant Whitespace in files
findtf -- find Temporary Files
findul -- find possibly Unused Libraries
zipdir -- Reclaim wasted space in ext2 directory entries
$ /usr/share/fslint/fslint/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]
If no arguments are supplied the $PATH is searched for any redundant
or conflicting files.
-A reports all aliases (soft and hard links) to files.
If no path(s) specified then the $PATH is searched.
If only path(s) specified then they are checked for duplicate named
files. You can qualify this with -C to ignore case in this search.
Qualifying with -c is more restictive as only files (or directories)
in the same directory whose names differ only in case are reported.
I.E. -c will flag files & directories that will conflict if transfered
to a case insensitive file system. Note if -c or -C specified and
no path(s) specifed the current directory is assumed.
Example usage:
$ /usr/share/fslint/fslint/findsn /usr/share/icons/ > icons-with-duplicate-names.txt
$ head icons-with-duplicate-names.txt
-rw-r--r-- 1 root root 683 2011-04-15 10:31 Humanity-Dark/AUTHORS
-rw-r--r-- 1 root root 683 2011-04-15 10:31 Humanity/AUTHORS
-rw-r--r-- 1 root root 17992 2011-04-15 10:31 Humanity-Dark/COPYING
-rw-r--r-- 1 root root 17992 2011-04-15 10:31 Humanity/COPYING
-rw-r--r-- 1 root root 4776 2011-03-29 08:57 Faenza/apps/16/DC++.xpm
-rw-r--r-- 1 root root 3816 2011-03-29 08:57 Faenza/apps/22/DC++.xpm
-rw-r--r-- 1 root root 4008 2011-03-29 08:57 Faenza/apps/24/DC++.xpm
-rw-r--r-- 1 root root 4456 2011-03-29 08:57 Faenza/apps/32/DC++.xpm
-rw-r--r-- 1 root root 7336 2011-03-29 08:57 Faenza/apps/48/DC++.xpm
-rw-r--r-- 1 root root 918 2011-03-29 09:03 Faenza/apps/16/Thunar.png
Solution 2
find . -mindepth 1 -printf '%h %f\n' | sort -t ' ' -k 2,2 | uniq -f 1 --all-repeated=separate | tr ' ' '/'
As the comment states, this will find folders as well. Here is the command to restrict it to files:
find . -mindepth 1 -type f -printf '%p %f\n' | sort -t ' ' -k 2,2 | uniq -f 1 --all-repeated=separate | cut -d' ' -f1
Solution 3
Save this to a file named duplicates.py
#!/usr/bin/env python
# Syntax: duplicates.py DIRECTORY
import os, sys
top = sys.argv[1]
d = {}
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
fn = os.path.join(root, name)
basename, extension = os.path.splitext(name)
basename = basename.lower() # ignore case
if basename in d:
print(d[basename])
print(fn)
else:
d[basename] = fn
Then make the file executable:
chmod +x duplicates.py
Run in e.g. like this:
./duplicates.py ~/images
It should output pairs of files that have the same basename(1). Written in python, you should be able to modify it.
Solution 4
I'm assuming you only need to see these "duplicates", then handle them manually. If so, this bash4 code should do what you want I think.
declare -A array=() dupes=()
while IFS= read -r -d '' file; do
base=${file##*/} base=${base%.*}
if [[ ${array[$base]} ]]; then
dupes[$base]+=" $file"
else
array[$base]=$file
fi
done < <(find /the/dir -type f -print0)
for key in "${!dupes[@]}"; do
echo "$key: ${array[$key]}${dupes[$key]}"
done
See http://mywiki.wooledge.org/BashGuide/Arrays#Associative_Arrays and/or the bash manual for help on the associative array syntax.
Solution 5
This is bname:
#!/bin/bash
#
# find for jpg/png/gif more files of same basename
#
# echo "processing ($1) $2"
bname=$(basename "$1" .$2)
find -name "$bname.jpg" -or -name "$bname.png"
Make it executable:
chmod a+x bname
Invoke it:
for ext in jpg png jpeg gif tiff; do find -name "*.$ext" -exec ./bname "{}" $ext ";" ; done
Pro:
- It's straightforward and simple, therefore extensible.
- Handles blanks, tabs, linebreaks and pagefeeds in filenames, afaik. (Assuming no such thing in the extension-name).
Con:
- It finds always the file itself, and if it finds a.gif for a.jpg, it will find a.jpg for a.gif too. So for 10 files of same basename, it finds 100 matches in the end.
Related videos on Youtube
JD Isaacks
Author of Learn JavaScript Next github/jisaacks twitter/jisaacks jisaacks.com
Updated on September 18, 2022Comments
-
JD Isaacks almost 2 years
I have a folder called
img
, this folder has many levels of sub-folders, all of which containing images. I am going to import them into an image server.Normally images (or any files) can have the same name as long as they are in a different directory path or have a different extension. However, the image server I am importing them into requires all the image names to be unique (even if the extensions are different).
For example the images
background.png
andbackground.gif
would not be allowed because even though they have different extensions they still have the same file name. Even if they are in separate sub-folders, they still need to be unique.So I am wondering if I can do a recursive search in the
img
folder to find a list of files that have the same name (excluding extension).Is there a command that can do this?
-
Eliah Kagan almost 7 years@DavidFoerster You're right! I have no idea why I had thought this might be a duplicate of How to find (and delete) duplicate files, but clearly it is not.
-
-
JD Isaacks about 13 yearsHow do I execute a command like that in a terminal? Is this something I need to save to a file first and execute the file?
-
geirha about 13 years@John Isaacks You can copy/paste it into the terminal or you can put it in a file and run it as a script. Either case will achieve the same.
-
JD Isaacks about 13 yearsThanks this worked. Some of the results are in purple and some are in green. Do you know off hand what the different colors mean?
-
RusGraf about 13 years@John It looks like FSlint is using
ls -l
to format its output. This question should explain what the colors mean. -
Navin over 8 yearsFSlint has a lot of dependencies.
-
David Foerster almost 7 yearsI changed the solution so that it returns the full (relative) path of all duplicates. Unfortunately it assumes that path names don’t contain white-space because
uniq
doesn’t provide a feature to select a different field delimiter. -
cp.engr over 6 years@DavidFoerster, your rev 6 was an improvement, but regarding your comment there, since when is
sed
obsolete? Arcane? Sure. Obsolete? Not that I'm aware of. (And I just searched to check.) -
David Foerster over 6 years@cp.engr: sed isn't obsolete. It's invocation became obsolete after another change of mine.
-
cp.engr over 6 years@DavidFoerster, obsolete doesn't seem like the right word to me, then. I think "obviated" would be a better fit. Regardless, thanks for clarifying.
-
David Foerster over 6 years@cp.engr: Thanks for the suggestion! I didn't know that word but it appears to fit the situation better.
-
Rolf over 5 yearsIt doesn't seem to work properly. It detects
P001.ORF
andP001 (1).ORF
as duplicates and also it seems to think that 60% of my files are duplicates which is wrong I'm pretty sure.fslint
found a realstic number of duplicate filenames which is close to 3%. -
Dennis Golomazov almost 5 yearsDoesn't work on mac:
find: -printf: unknown primary or operator
-
T'n'E over 4 yearsWhile this is on the AskUbuntu exchange, this will work on mac (from previous edits):
find src -exec basename {} \; | sed 's/\(.*\)\..*/\1/' | sort | uniq -c | grep -v "^[ \t]*1 "
-
ojblass almost 3 yearsI do not like how complicated my simple answer has morphed to over the years.