Efficiently compute Intersection of two Sets in Java?

java performance optimization set

62,513

Solution 1

Just use Google Guava's Sets#intersection(Set, Set) method.

Solution 2

You can avoid all manual work by using the Set method retainAll().

From docs:

s1.retainAll(s2) — transforms s1 into the intersection of s1 and s2. (The intersection of two sets is the set containing only the elements common to both sets.)

Solution 3

Can the members of the sets be easily mapped into a relatively small range of integers? If so, consider using BitSets. Intersection then is just bitwise and's - 32 potential members at a time.

Solution 4

Using Java 8 stream:

set1.stream().filter(s -> set2.contains(s)).collect(Collectors.toList());

Solution 5

If both sets can be sorted, like TreeSet running both iterators could be a faster way to count the number of shared objects.

If you do this operation often, it might bring a lot if you can wrap the sets so that you can cache the result of the intersection operation keeping a dirty flag to track validity of the cached result, calculating again if needed.

View more solutions

62,513

Author by

Ina

@tiedyeina

Updated on September 13, 2020

Comments

Ina almost 4 years
What is the most efficient way to find the size of the intersection of two non-sparse Sets in Java? This is an operation I will be calling on large sets a very large number of times, so optimisation is important. I cannot modify the original sets.

I have looked at Apache Commons CollectionUtils.intersection which appears to be quite slow. My current approach is to take the smaller of the two sets, clone it, and then call .retainAll on the larger of the two sets.
```
public static int getIntersection(Set<Long> set1, Set<Long> set2) {
    boolean set1IsLarger = set1.size() > set2.size();
    Set<Long> cloneSet = new HashSet<Long>(set1IsLarger ? set2 : set1);
    cloneSet.retainAll(set1IsLarger ? set1 : set2);
    return cloneSet.size();
}
```