uniq command not working properly?
Solution 1
You need to use sort
before uniq
:
find . -type f -exec md5sum {} ';' | sort | uniq -w 33
uniq
only removes repeated lines. It does not re-order the lines looking for repeats. sort
does that part.
This is documented in man uniq
:
Note:
uniq
does not detect repeated lines unless they are adjacent. You may want to sort the input first, or usesort -u
withoutuniq
.
Solution 2
The input for uniq
needs to be sorted. So for the example case,
find . -type f -exec md5sum '{}' ';' | sort | uniq -w 33
would work. The -w
(--check-chars=N
) makes the lines unique only regarding the first column; This option works for this case. but the possibilities to specify the relevant parts of the line for uniq
are limited. For example, there are no options to specify working on some column 3 and 5, ignoring column 4.
The command sort
has an option for unique output lines itself, and the lines are unique regarding the keys used for sorting. This means we can make use of the powerful key syntax of sort
to define regarding which part the lines should be uniq.
For the example,
find . -type f -exec md5sum '{}' ';' | sort -k 1,1 -u
gives just the same result, but the sort
part is more flexible for other uses.
Related videos on Youtube
user2127726
Updated on September 18, 2022Comments
-
user2127726 almost 2 years
So I'm checking the
md5
hash of my files with this as my output:657cf4512a77bf47c39a0482be8e41e0 ./dupes2.txt 657cf4512a77bf47c39a0482be8e41e0 ./dupes.txt 8d60a927ce0f411ec94ac26a4785f749 ./derpina.txt 15f63928b8a1d5337137c38b5d66eed3 ./foo.txt 8d60a927ce0f411ec94ac26a4785f749 ./derp.txt
However, after running
find . -type f -exec md5sum '{}' ';' | uniq -w 33
to find the unique hashes I get this:657cf4512a77bf47c39a0482be8e41e0 ./dupes2.txt 8d60a927ce0f411ec94ac26a4785f749 ./derpina.txt 15f63928b8a1d5337137c38b5d66eed3 ./foo.txt 8d60a927ce0f411ec94ac26a4785f749 ./derp.txt
From my understanding, only one of either
derpina.txt
orderp.txt
should be showing up since their hashes are the same. Am I missing something? Can anyone enlighten me as to why it outputs like this?-
Admin over 9 yearsFigured it out. Apparently uniq does not detect repeated lines unless they are adjacent. Link to answer that helped me stackoverflow.com/questions/23114677/…
-
-
Devaroop about 5 years
uniq
should be aliased assort -u
by default in all systems. If at all it always needs "sort" for it to work properly. -
John1024 about 5 yearsThat change would lessen some confusion. On the other hand,
uniq
has many features not available withsort -u
. Also, there are cases where one wants to useuniq
withoutsort
.