Finding duplicate entries in Collection
Solution 1
It depends on the semantic of the criterion:
If your criterion is always the same for a given class, and is inherent to the underlying concept, you should just implement equals
and hashCode
and use a set.
If your criterion depend on the context, org.apache.commons.collections.CollectionUtils.select(java.util.Collection, org.apache.commons.collections.Predicate) might be the right solution for you.
Solution 2
If you want to find duplicates, rather than just removing them, one approach would be to throw the Collection into an array, sort the array via a Comparator that implements your criteria, then linearly walk through the array, looking for adjacent duplicates.
Here's a sketch (not tested):
MyComparator myComparator = new MyComparator();
MyType[] myArray = myList.toArray();
Arrays.sort( myArray, myComparator );
for ( int i = 1; i < myArray.length; ++i ) {
if ( 0 == myComparator.compare( myArray[i - 1], myArray[i] )) {
// Found a duplicate!
}
}
Edit: From your comment, you just want to know if there are duplicates. The approach above works for this too. But you could more simply just create a java.util.SortedSet with a custom Comparator. Here's a sketch:
MyComparator myComparator = new MyComparator();
TreeSet treeSet = new TreeSet( myComparator );
treeSet.addAll( myCollection );
boolean containsDuplicates = (treeSet.size() != myCollection.size());
Solution 3
You can adapt a Java set to search for duplicates among objects of an arbitrary type: wrap your target class in a private wrapper that evaluates equality based on your criteria, and construct a set of wrappers.
Here is a somewhat lengthy example that illustrates the technique. It considers two people with the same first name to be equal, and so it detects three duplicates in the array of five objects.
import java.util.*;
import java.lang.*;
class Main {
static class Person {
private String first;
private String last;
public String getFirst() {return first;}
public String getLast() {return last;}
public Person(String f, String l) {
first = f;
last = l;
}
public String toString() {
return first+" "+last;
}
}
public static void main (String[] args) throws java.lang.Exception {
List<Person> people = new ArrayList<Person>();
people.add(new Person("John", "Smith"));
people.add(new Person("John", "Scott"));
people.add(new Person("Jack", "First"));
people.add(new Person("John", "Walker"));
people.add(new Person("Jack", "Black"));
Set<Object> seen = new HashSet<Object>();
for (Person p : people) {
final Person thisPerson = p;
class Wrap {
public int hashCode() { return thisPerson.getFirst().hashCode(); }
public boolean equals(Object o) {
Wrap other = (Wrap)o;
return other.wrapped().getFirst().equals(thisPerson.getFirst());
}
public Person wrapped() { return thisPerson; }
};
Wrap wrap = new Wrap();
if (seen.add(wrap)) {
System.out.println(p + " is new");
} else {
System.out.println(p + " is a duplicate");
}
}
}
}
You can play with this example on ideone [link].
Solution 4
You could use a map and while iterating over the collection put the elements into the map (the predicates would form the key) and if there's already an entry you've found a duplicate.
For more information see here: Finding duplicates in a collection
Admin
Updated on July 18, 2022Comments
-
Admin almost 2 years
Is there a tool or library to find duplicate entries in a Collection according to specific criteria that can be implemented?
To make myself clear: I want to compare the entries to each other according to specific criteria. So I think a
Predicate
returning justtrue
orfalse
isn't enough.
I can't use
equals
. -
Admin almost 12 yearsI want to compare the entries between each other, not to arbitrary criteria.
-
Sergey Kalinichenko almost 12 years@dragon66 If your hash function is good, the efficiency is the same as with any hash table, which is
O(1)
for each item, orO(N)
for the entire collection. -
dragon66 almost 12 yearsdasblinkenlight: I am a bit concerned about the wrap object creation although I know they will be gone outside the loop.
-
Sergey Kalinichenko almost 12 years@dragon66 Java is quite efficient at creating small objects (and these objects are tiny). Unfortunately, Java does not have a concept parallel to .NET's equality comparer - that would allow for a solution that avoids temporary objects altogether.
-
Christian Hujer about 9 yearsAs the OP says, he can't use
equals()
. AHashSet
useshashCode()
andequals()
. Therefore he cannot use aHashSet
.