Finding "maximum" overlapping interval pair in O(nlog(n))

12,251

First, sort the intervals: first by left endpoint in increasing order, then — as a secondary criterion — by right endpoint in decreasing order. For the rest of this answer, I'll assume that the intervals are already in sorted order.

Now, there are two possibilities for what the maximum possible overlap might be:

  • it may be between an interval and a later interval that it completely covers.
  • it may be between an interval and the very next interval that it doesn't completely cover.

We can cover both cases in O(n) time by iterating over the intervals, keeping track of the following:

  • the greatest overlap we've seen so far, and the relevant pair of intervals.
  • the latest interval we've seen, call it L, that wasn't completely covered by any of its predecessors. (For this, the key insight is that thanks to the ordering of the intervals, we can easily tell if an interval is completely covered by any of its predecessors — and therefore if we need to update L — by simply checking if it's completely covered by the current L. So we can keep L up-to-date in O(1) time.)

and computing each interval's overlap with L.

So:

result := []
max_overlap := 0
L := sorted_intervals[1]
for interval I in sorted_intervals[2..n]:
    overlap := MIN(L.right, I.right) - I.left
    if overlap >= max_overlap:
        result := [L, I]
        max_overlap := overlap
    if I.right > L.right:
        L := I

So the total cost is the cost of sorting the intervals, which is likely to be O(n log n) time but may be O(n) if you can use bucket-sort or radix-sort or similar.

Share:
12,251
user1751434
Author by

user1751434

Updated on June 05, 2022

Comments

  • user1751434
    user1751434 almost 2 years

    Problem Statement

    Input set of n intervals; {[s_1,t_1], [s_2,t_2], ... ,[s_n,t_n]}.

    Output pair of intervals; {[s_i,t_i],[s_j,t_j]}, with the maximum overlap among all the interval pairs.

    Example

    input intervals : {[1, 10], [2, 6], [3,15], [5, 9]}

    -> There are possible 6 interval pairs. Among those pairs, [1,10] & [3,15] has the largest possible overlap of 7.

    output : {[1,10],[3,15]}

    A naive algorithm will be a brute force method where all n intervals get compared to each other, while the current maximum overlap value being tracked. The time complexity would be O(n^2) for this case.

    I was able to find many procedures regarding interval trees, maximum number of overlapping intervals and maximum set of non-overlapping intervals, but nothing on this problem. Maybe I would be able to use the ideas given in the above algorithms, but I wasn't able to come up with one.

    I spent many hours trying to figure out a nice solution, but I think I need some help at this point.

    Any suggestions will help!

  • user1751434
    user1751434 over 7 years
    You are a genius.
  • user3886907
    user3886907 over 6 years
    Doesn't works for intervals (1,6),(3,6),(5,8). As per your logic, we will ignore (3,6) since it is covered by its predecessor (1,6). We are left with (1,6),(5,8) , overlap between them =1. This is wrong since max overlap is between (1,6),(3,6) = 3.
  • ruakh
    ruakh over 6 years
    @user3886907: Whoops, you are quite right, thanks! Will fix . . .
  • Jonathan
    Jonathan over 5 years
    I believe this is still not fully correct. Consider (1,6),(2,5),(5,8). This algorithm returns (1,6),(2,5), overlap between them =4. But the right answer is (1,6),(2,5) = 3.
  • dixitk13
    dixitk13 almost 5 years
    is this algorithm possible in lesser than linear time?
  • ruakh
    ruakh almost 5 years
    @dixitk13: Definitely not, because any algorithm for this problem will need to examine all of the intervals.
  • dixitk13
    dixitk13 almost 5 years
    how about if you use augmented interval trees the membership check becomes K + log N, where K is the number of overlapping intervals with the query interval, and then finding the max should be an order of log N
  • ruakh
    ruakh almost 5 years
    @dixitk13: I think you're not taking into account the time to build the augmented interval tree?
  • dixitk13
    dixitk13 almost 5 years
    building the tree itself will cost another O(n log n) which is similar to sorting it as you mentioned, but should be able to reap the benefits when the query happens. I was just thinking out loud - not trying to devalue anything here.