Parsing date with timezone from an email?

31,994

Solution 1

email.utils has a parsedate() function for the RFC 2822 format, which as far as I know is not deprecated.

>>> import email.utils
>>> import time
>>> import datetime
>>> email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100')
(2009, 11, 16, 13, 32, 2, 0, 1, -1)
>>> time.mktime((2009, 11, 16, 13, 32, 2, 0, 1, -1))
1258378322.0
>>> datetime.datetime.fromtimestamp(1258378322.0)
datetime.datetime(2009, 11, 16, 13, 32, 2)

Please note, however, that the parsedate method does not take into account the time zone and time.mktime always expects a local time tuple as mentioned here.

>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) ==
... time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100'))
True

So you'll still need to parse out the time zone and take into account the local time difference, too:

>>> REMOTE_TIME_ZONE_OFFSET = +9 * 60 * 60
>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) +
... time.timezone - REMOTE_TIME_ZONE_OFFSET)
1258410122.0

Solution 2

Use email.utils.parsedate_tz(date):

msg=email.message_from_file(open(file_name))
date=None
date_str=msg.get('date')
if date_str:
    date_tuple=email.utils.parsedate_tz(date_str)
    if date_tuple:
        date=datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
if date:
    ... # valid date found

Solution 3

For python 3.3+ you can use parsedate_to_datetime function:

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('Mon, 16 Nov 2009 13:32:02 +0100')
...
datetime.datetime(2009, 11, 16, 13, 32, 2, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

Official documentation:

The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo. New in version 3.3.

Solution 4

In Python 3.3+, email message can parse the headers for you:

import email
import email.policy

headers = email.message_from_file(file, policy=email.policy.default)
print(headers.get('date').datetime)
# -> 2009-11-16 13:32:02+01:00

Since Python 3.2+, it works if you replace %Z with %z:

>>> from datetime import datetime
>>> datetime.strptime("Mon, 16 Nov 2009 13:32:02 +0100", 
...                   "%a, %d %b %Y %H:%M:%S %z")
datetime.datetime(2009, 11, 16, 13, 32, 2,
                  tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

Or using email package (Python 3.3+):

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime("Mon, 16 Nov 2009 13:32:02 +0100")
datetime.datetime(2009, 11, 16, 13, 32, 2,
                  tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

if UTC offset is specified as -0000 then it returns a naive datetime object that represents time in UTC otherwise it returns an aware datetime object with the corresponding tzinfo set.

To parse rfc 5322 date-time string on earlier Python versions (2.6+):

from calendar import timegm
from datetime import datetime, timedelta, tzinfo
from email.utils import parsedate_tz

ZERO = timedelta(0)
time_string = 'Mon, 16 Nov 2009 13:32:02 +0100'
tt = parsedate_tz(time_string)
#NOTE: mktime_tz is broken on Python < 2.7.4,
#  see https://bugs.python.org/issue21267
timestamp = timegm(tt) - tt[9] # local time - utc offset == utc time
naive_utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
aware_utc_dt = naive_utc_dt.replace(tzinfo=FixedOffset(ZERO, 'UTC'))
aware_dt = aware_utc_dt.astimezone(FixedOffset(timedelta(seconds=tt[9])))
print(aware_utc_dt)
print(aware_dt)
# -> 2009-11-16 12:32:02+00:00
# -> 2009-11-16 13:32:02+01:00

where FixedOffset is based on tzinfo subclass from the datetime documentation:

class FixedOffset(tzinfo):
    """Fixed UTC offset: `time = utc_time + utc_offset`."""
    def __init__(self, offset, name=None):
        self.__offset = offset
        if name is None:
            seconds = abs(offset).seconds
            assert abs(offset).days == 0
            hours, seconds = divmod(seconds, 3600)
            if offset < ZERO:
                hours = -hours
            minutes, seconds = divmod(seconds, 60)
            assert seconds == 0
            #NOTE: the last part is to remind about deprecated POSIX
            #  GMT+h timezones that have the opposite sign in the
            #  name; the corresponding numeric value is not used e.g.,
            #  no minutes
            self.__name = '<%+03d%02d>GMT%+d' % (hours, minutes, -hours)
        else:
            self.__name = name
    def utcoffset(self, dt=None):
        return self.__offset
    def tzname(self, dt=None):
        return self.__name
    def dst(self, dt=None):
        return ZERO
    def __repr__(self):
        return 'FixedOffset(%r, %r)' % (self.utcoffset(), self.tzname())

Solution 5

Have you tried

rfc822.parsedate_tz(date) # ?

More on RFC822, http://docs.python.org/library/rfc822.html

It's deprecated (parsedate_tz is now in email.utils.parsedate_tz), though.

But maybe these answers help:

Share:
31,994
gruszczy
Author by

gruszczy

I lead and manage the Google Assistant on Speakers team.

Updated on July 02, 2020

Comments

  • gruszczy
    gruszczy almost 4 years

    I am trying to retrieve date from an email. At first it's easy:

    message = email.parser.Parser().parse(file)
    date = message['Date']
    print date
    

    and I receive:

    'Mon, 16 Nov 2009 13:32:02 +0100'
    

    But I need a nice datetime object, so I use:

    datetime.strptime('Mon, 16 Nov 2009 13:32:02 +0100', '%a, %d %b %Y %H:%M:%S %Z')
    

    which raises ValueError, since %Z isn't format for +0100. But I can't find proper format for timezone in the documentation, there is only this %Z for zone. Can someone help me on that?

  • gruszczy
    gruszczy over 14 years
    Yeah, I've seen it, but it's deprecated.
  • gruszczy
    gruszczy over 14 years
    Yep, those functions seems to have been moved to utils and email is fine to use. Thanks.
  • Eric Pruitt
    Eric Pruitt over 12 years
    That won't yield an accurate value. time.mktime assumes a local time tuple, and the parsedate function does not take into account the time zone:time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) == time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100')) returns True. Tagging @gruszczy in case he's relying on this method.
  • SamB
    SamB over 12 years
    This function is now known as email.utils.parsedate_tz(), FWIW.
  • jfs
    jfs about 10 years
    mktime + timezone may produce wrong values for past dates or if the timezone has DST transitions: time.timezone != time.altzone. Use tt = parsedate_tz(date_str); timestamp = calendar.timegm(tt) - tt[9] instead.
  • jfs
    jfs about 10 years
    mktime_tz may fail on Python before 2.7.4 if the local timezone had different UTC offset at date_tuple. Use calendar.timegm() directly in this case.
  • jfs
    jfs over 9 years
    it does not answer the question. You use different time format. Note: the time format in the question is defined in rfc 5322 (and its predessors) -- it can be parsed using email.utils.parsedate_tz on Python 2.7. Your format looks like rfc 3339. Both can be parsed using dateutil.parser.parse() on Python 2. See Convert timestamps with offset to datetime obj using strptime
  • dnozay
    dnozay over 9 years
    @J.F.Sebastian, had you not deleted my answer on one of the duplicate question, I would not have posted my answer here. My problem was strptime does not handle %z format, I believe this is the same problem.
  • jfs
    jfs over 9 years
    I can't delete someone's else answer by myself. Could you link to the corresponding question?
  • mgilbert
    mgilbert over 5 years
    In more recent versions of python you can also use email.utils.parsedate_to_datetime
  • jtbr
    jtbr over 5 years
    This returns a naive datetime in UTC. To make it aware, you could provide a time zone as the second parameter to fromtimestamp. In python 3, that's easy: datetime.timezone.utc. In python 2.7, you'd need to implement a UTC tzinfo class and provide that.
  • klapshin
    klapshin over 5 years
    In python 3.7 parsedate_tz have not counted tz shift in datetime '2019-03-14 20:43:56 +0300' and just returned a naive '2019-03-14 20:43:56'. Although email.utils.parsedate_to_datetime from @jfs answer solved the problem and returned tz-aware object.