Efficient date range overlap calculation in python?
Solution 1
- Determine the latest of the two start dates and the earliest of the two end dates.
- Compute the timedelta by subtracting them.
- If the delta is positive, that is the number of days of overlap.
Here is an example calculation:
>>> from datetime import datetime
>>> from collections import namedtuple
>>> Range = namedtuple('Range', ['start', 'end'])
>>> r1 = Range(start=datetime(2012, 1, 15), end=datetime(2012, 5, 10))
>>> r2 = Range(start=datetime(2012, 3, 20), end=datetime(2012, 9, 15))
>>> latest_start = max(r1.start, r2.start)
>>> earliest_end = min(r1.end, r2.end)
>>> delta = (earliest_end - latest_start).days + 1
>>> overlap = max(0, delta)
>>> overlap
52
Solution 2
Function calls are more expensive than arithmetic operations.
The fastest way of doing this involves 2 subtractions and 1 min():
min(r1.end - r2.start, r2.end - r1.start).days + 1
compared with the next best which needs 1 subtraction, 1 min() and a max():
(min(r1.end, r2.end) - max(r1.start, r2.start)).days + 1
Of course with both expressions you still need to check for a positive overlap.
Solution 3
I implemented a TimeRange class as you can see below.
The get_overlapped_range first negates all the non overlapped options by a simple condition, and then calculate the overlapped range by considering all the possible options.
To get the amount of days you'll need to take the TimeRange value that was returned from get_overlapped_range and divide the duration by 60*60*24.
class TimeRange(object):
def __init__(self, start, end):
self.start = start
self.end = end
self.duration = self.end - self.start
def is_overlapped(self, time_range):
if max(self.start, time_range.start) < min(self.end, time_range.end):
return True
else:
return False
def get_overlapped_range(self, time_range):
if not self.is_overlapped(time_range):
return
if time_range.start >= self.start:
if self.end >= time_range.end:
return TimeRange(time_range.start, time_range.end)
else:
return TimeRange(time_range.start, self.end)
elif time_range.start < self.start:
if time_range.end >= self.end:
return TimeRange(self.start, self.end)
else:
return TimeRange(self.start, time_range.end)
def __repr__(self):
return '{0} ------> {1}'.format(*[time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(d))
for d in [self.start, self.end]])
Solution 4
You can use the datetimerange package: https://pypi.org/project/DateTimeRange/
from datetimerange import DateTimeRange
time_range1 = DateTimeRange("2015-01-01T00:00:00+0900", "2015-01-04T00:20:00+0900")
time_range2 = DateTimeRange("2015-01-01T00:00:10+0900", "2015-01-04T00:20:00+0900")
tem3 = time_range1.intersection(time_range2)
if tem3.NOT_A_TIME_STR == 'NaT': # No overlap
S_Time = 0
else: # Output the overlap seconds
S_Time = tem3.timedelta.total_seconds()
"2015-01-01T00:00:00+0900" inside the DateTimeRange() can also be datetime format, like Timestamp('2017-08-30 20:36:25').
Solution 5
Building on the solution of @Raymond Hettinger, since python 3.6 you can now use NamedTuple
from the typing
module.
from datetime import datetime
from typing import NamedTuple
class Range(NamedTuple):
start: datetime
end: datetime
>>> r1 = Range(start=datetime(2012, 1, 15), end=datetime(2012, 5, 10))
>>> r2 = Range(start=datetime(2012, 3, 20), end=datetime(2012, 9, 15))
>>> latest_start = max(r1.start, r2.start)
>>> earliest_end = min(r1.end, r2.end)
>>> delta = (earliest_end - latest_start).days + 1
>>> overlap = max(0, delta)
>>> overlap
52
Andreas Jung
Updated on July 10, 2021Comments
-
Andreas Jung almost 3 years
I have two date ranges where each range is determined by a start and end date (obviously, datetime.date() instances). The two ranges can overlap or not. I need the number of days of the overlap. Of course I can pre-fill two sets with all dates within both ranges and the perform a set intersection but this is possibly inefficient...is there a better way apart from another solution using a long if-elif section covering all cases ?
-
darkless over 10 years+1 very nice solution. Though, this doesn't quite work on dates that are fully contained in the other. For simplicity in integers: Range(1,4) and Range(2,3) returns 1
-
Raymond Hettinger over 9 years@darkless Actually, it returns 2 which is correct. Try these inputs
r1 = Range(start=datetime(2012, 1, 1), end=datetime(2012, 1, 4)); r2 = Range(start=datetime(2012, 1, 2), end=datetime(2012, 1, 3))
. I think you missed the+1
in the overlap calculation (necessary because the interval is closed on both ends). -
tkyass almost 8 yearsThis method will not return correct answer always. e.g.
Range = namedtuple('Range', ['start', 'end']) r1 = Range(start=datetime(2016, 6, 15), end=datetime(2016, 6, 15)) r2 = Range(start=datetime(2016, 6, 11), end=datetime(2016, 6, 18)) print min(r1.end - r2.start, r2.end - r1.start).days + 1
will print 4 where it supposded to print 1 -
Eric over 5 yearsWhat if you want to calculate 2 times instead of 2 dates ? @RaymondHettinger
-
Arthur D. Howland over 4 yearsI get an ambiguous series error using the first equation. Do I need a particular library?
-
Deep almost 4 yearsThanks, Just had a look at the documentation for
DateTimeRange
package and it seems they supportis_intersection
which natively returns a boolean value (True or False) depending on whether or not there is an intersection between two date ranges. So, for your example:time_range1.is_intersection(time_range2)
would returnTrue
if they intersect elseFalse
-
ErikXIII over 3 yearsIf you use datetime objects with times you could instead of .days write .total_seconds().
-
Mazhar Ali almost 3 yearsintersection() returns a DateTimeRange object and its property NOT_A_TIME_STR always equals 'NaT', so the if condition will always be true. Better approach will be to use the is_intersection which returns True or False.