Parsing date with timezone from an email?
Solution 1
email.utils
has a parsedate()
function for the RFC 2822 format, which as far as I know is not deprecated.
>>> import email.utils
>>> import time
>>> import datetime
>>> email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100')
(2009, 11, 16, 13, 32, 2, 0, 1, -1)
>>> time.mktime((2009, 11, 16, 13, 32, 2, 0, 1, -1))
1258378322.0
>>> datetime.datetime.fromtimestamp(1258378322.0)
datetime.datetime(2009, 11, 16, 13, 32, 2)
Please note, however, that the parsedate
method does not take into account the time zone and time.mktime
always expects a local time tuple as mentioned here.
>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) ==
... time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100'))
True
So you'll still need to parse out the time zone and take into account the local time difference, too:
>>> REMOTE_TIME_ZONE_OFFSET = +9 * 60 * 60
>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) +
... time.timezone - REMOTE_TIME_ZONE_OFFSET)
1258410122.0
Solution 2
Use email.utils.parsedate_tz(date)
:
msg=email.message_from_file(open(file_name))
date=None
date_str=msg.get('date')
if date_str:
date_tuple=email.utils.parsedate_tz(date_str)
if date_tuple:
date=datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
if date:
... # valid date found
Solution 3
For python 3.3+ you can use parsedate_to_datetime function:
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('Mon, 16 Nov 2009 13:32:02 +0100')
...
datetime.datetime(2009, 11, 16, 13, 32, 2, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
Official documentation:
The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo. New in version 3.3.
Solution 4
In Python 3.3+, email
message can parse the headers for you:
import email
import email.policy
headers = email.message_from_file(file, policy=email.policy.default)
print(headers.get('date').datetime)
# -> 2009-11-16 13:32:02+01:00
Since Python 3.2+, it works if you replace %Z
with %z
:
>>> from datetime import datetime
>>> datetime.strptime("Mon, 16 Nov 2009 13:32:02 +0100",
... "%a, %d %b %Y %H:%M:%S %z")
datetime.datetime(2009, 11, 16, 13, 32, 2,
tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
Or using email
package (Python 3.3+):
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime("Mon, 16 Nov 2009 13:32:02 +0100")
datetime.datetime(2009, 11, 16, 13, 32, 2,
tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
if UTC offset is specified as -0000
then it returns a naive datetime object that represents time in UTC otherwise it returns an aware datetime object with the corresponding tzinfo
set.
To parse rfc 5322 date-time string on earlier Python versions (2.6+):
from calendar import timegm
from datetime import datetime, timedelta, tzinfo
from email.utils import parsedate_tz
ZERO = timedelta(0)
time_string = 'Mon, 16 Nov 2009 13:32:02 +0100'
tt = parsedate_tz(time_string)
#NOTE: mktime_tz is broken on Python < 2.7.4,
# see https://bugs.python.org/issue21267
timestamp = timegm(tt) - tt[9] # local time - utc offset == utc time
naive_utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
aware_utc_dt = naive_utc_dt.replace(tzinfo=FixedOffset(ZERO, 'UTC'))
aware_dt = aware_utc_dt.astimezone(FixedOffset(timedelta(seconds=tt[9])))
print(aware_utc_dt)
print(aware_dt)
# -> 2009-11-16 12:32:02+00:00
# -> 2009-11-16 13:32:02+01:00
where FixedOffset
is based on tzinfo
subclass from the datetime
documentation:
class FixedOffset(tzinfo):
"""Fixed UTC offset: `time = utc_time + utc_offset`."""
def __init__(self, offset, name=None):
self.__offset = offset
if name is None:
seconds = abs(offset).seconds
assert abs(offset).days == 0
hours, seconds = divmod(seconds, 3600)
if offset < ZERO:
hours = -hours
minutes, seconds = divmod(seconds, 60)
assert seconds == 0
#NOTE: the last part is to remind about deprecated POSIX
# GMT+h timezones that have the opposite sign in the
# name; the corresponding numeric value is not used e.g.,
# no minutes
self.__name = '<%+03d%02d>GMT%+d' % (hours, minutes, -hours)
else:
self.__name = name
def utcoffset(self, dt=None):
return self.__offset
def tzname(self, dt=None):
return self.__name
def dst(self, dt=None):
return ZERO
def __repr__(self):
return 'FixedOffset(%r, %r)' % (self.utcoffset(), self.tzname())
Solution 5
Have you tried
rfc822.parsedate_tz(date) # ?
More on RFC822, http://docs.python.org/library/rfc822.html
It's deprecated (parsedate_tz is now in email.utils.parsedate_tz
), though.
But maybe these answers help:
gruszczy
I lead and manage the Google Assistant on Speakers team.
Updated on July 02, 2020Comments
-
gruszczy almost 4 years
I am trying to retrieve date from an email. At first it's easy:
message = email.parser.Parser().parse(file) date = message['Date'] print date
and I receive:
'Mon, 16 Nov 2009 13:32:02 +0100'
But I need a nice datetime object, so I use:
datetime.strptime('Mon, 16 Nov 2009 13:32:02 +0100', '%a, %d %b %Y %H:%M:%S %Z')
which raises
ValueError, since %Z isn't format for +0100
. But I can't find proper format for timezone in the documentation, there is only this%Z
for zone. Can someone help me on that? -
gruszczy over 14 yearsYeah, I've seen it, but it's deprecated.
-
gruszczy over 14 yearsYep, those functions seems to have been moved to utils and email is fine to use. Thanks.
-
Eric Pruitt over 12 yearsThat won't yield an accurate value.
time.mktime
assumes a local time tuple, and the parsedate function does not take into account the time zone:time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) == time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100'))
returnsTrue
. Tagging @gruszczy in case he's relying on this method. -
SamB over 12 yearsThis function is now known as email.utils.parsedate_tz(), FWIW.
-
jfs about 10 years
mktime + timezone
may produce wrong values for past dates or if the timezone has DST transitions:time.timezone != time.altzone
. Usett = parsedate_tz(date_str); timestamp = calendar.timegm(tt) - tt[9]
instead. -
jfs about 10 years
mktime_tz
may fail on Python before 2.7.4 if the local timezone had different UTC offset atdate_tuple
. Usecalendar.timegm()
directly in this case. -
jfs over 9 yearsit does not answer the question. You use different time format. Note: the time format in the question is defined in rfc 5322 (and its predessors) -- it can be parsed using
email.utils.parsedate_tz
on Python 2.7. Your format looks like rfc 3339. Both can be parsed usingdateutil.parser.parse()
on Python 2. See Convert timestamps with offset to datetime obj using strptime -
dnozay over 9 years@J.F.Sebastian, had you not deleted my answer on one of the duplicate question, I would not have posted my answer here. My problem was
strptime does not handle %z format
, I believe this is the same problem. -
jfs over 9 yearsI can't delete someone's else answer by myself. Could you link to the corresponding question?
-
mgilbert over 5 yearsIn more recent versions of python you can also use
email.utils.parsedate_to_datetime
-
jtbr over 5 yearsThis returns a naive
datetime
in UTC. To make it aware, you could provide a time zone as the second parameter tofromtimestamp
. In python 3, that's easy:datetime.timezone.utc
. In python 2.7, you'd need to implement a UTCtzinfo
class and provide that. -
klapshin over 5 yearsIn python 3.7 parsedate_tz have not counted tz shift in datetime '2019-03-14 20:43:56 +0300' and just returned a naive '2019-03-14 20:43:56'. Although email.utils.parsedate_to_datetime from @jfs answer solved the problem and returned tz-aware object.