Finding "maximum" overlapping interval pair in O(nlog(n))
First, sort the intervals: first by left endpoint in increasing order, then — as a secondary criterion — by right endpoint in decreasing order. For the rest of this answer, I'll assume that the intervals are already in sorted order.
Now, there are two possibilities for what the maximum possible overlap might be:
- it may be between an interval and a later interval that it completely covers.
- it may be between an interval and the very next interval that it doesn't completely cover.
We can cover both cases in O(n) time by iterating over the intervals, keeping track of the following:
- the greatest overlap we've seen so far, and the relevant pair of intervals.
- the latest interval we've seen, call it L, that wasn't completely covered by any of its predecessors. (For this, the key insight is that thanks to the ordering of the intervals, we can easily tell if an interval is completely covered by any of its predecessors — and therefore if we need to update L — by simply checking if it's completely covered by the current L. So we can keep L up-to-date in O(1) time.)
and computing each interval's overlap with L.
So:
result := []
max_overlap := 0
L := sorted_intervals[1]
for interval I in sorted_intervals[2..n]:
overlap := MIN(L.right, I.right) - I.left
if overlap >= max_overlap:
result := [L, I]
max_overlap := overlap
if I.right > L.right:
L := I
So the total cost is the cost of sorting the intervals, which is likely to be O(n log n) time but may be O(n) if you can use bucket-sort or radix-sort or similar.
user1751434
Updated on June 05, 2022Comments
-
user1751434 almost 2 years
Problem Statement
Input set of n intervals; {[s_1,t_1], [s_2,t_2], ... ,[s_n,t_n]}.
Output pair of intervals; {[s_i,t_i],[s_j,t_j]}, with the maximum overlap among all the interval pairs.
Example
input intervals : {[1, 10], [2, 6], [3,15], [5, 9]}
-> There are possible 6 interval pairs. Among those pairs, [1,10] & [3,15] has the largest possible overlap of 7.
output : {[1,10],[3,15]}
A naive algorithm will be a brute force method where all n intervals get compared to each other, while the current maximum overlap value being tracked. The time complexity would be O(n^2) for this case.
I was able to find many procedures regarding interval trees, maximum number of overlapping intervals and maximum set of non-overlapping intervals, but nothing on this problem. Maybe I would be able to use the ideas given in the above algorithms, but I wasn't able to come up with one.
I spent many hours trying to figure out a nice solution, but I think I need some help at this point.
Any suggestions will help!
-
user1751434 over 7 yearsYou are a genius.
-
user3886907 over 6 yearsDoesn't works for intervals (1,6),(3,6),(5,8). As per your logic, we will ignore (3,6) since it is covered by its predecessor (1,6). We are left with (1,6),(5,8) , overlap between them =1. This is wrong since max overlap is between (1,6),(3,6) = 3.
-
ruakh over 6 years@user3886907: Whoops, you are quite right, thanks! Will fix . . .
-
Jonathan over 5 yearsI believe this is still not fully correct. Consider (1,6),(2,5),(5,8). This algorithm returns (1,6),(2,5), overlap between them =4. But the right answer is (1,6),(2,5) = 3.
-
dixitk13 almost 5 yearsis this algorithm possible in lesser than linear time?
-
ruakh almost 5 years@dixitk13: Definitely not, because any algorithm for this problem will need to examine all of the intervals.
-
dixitk13 almost 5 yearshow about if you use augmented interval trees the membership check becomes K + log N, where K is the number of overlapping intervals with the query interval, and then finding the max should be an order of log N
-
ruakh almost 5 years@dixitk13: I think you're not taking into account the time to build the augmented interval tree?
-
dixitk13 almost 5 yearsbuilding the tree itself will cost another O(n log n) which is similar to sorting it as you mentioned, but should be able to reap the benefits when the query happens. I was just thinking out loud - not trying to devalue anything here.