Python IMAP: =?utf-8?Q? in subject string
Solution 1
In MIME terminology, those encoded chunks are called encoded-words. You can decode them like this:
import email.header
text, encoding = email.header.decode_header('=?utf-8?Q?Subject?=')[0]
Check out the docs for email.header
for more details.
Solution 2
This is a MIME encoded-word. You can parse it with email.header
:
import email.header
def decode_mime_words(s):
return u''.join(
word.decode(encoding or 'utf8') if isinstance(word, bytes) else word
for word, encoding in email.header.decode_header(s))
print(decode_mime_words(u'=?utf-8?Q?Subject=c3=a4?=X=?utf-8?Q?=c3=bc?='))
Solution 3
The text is encoded as a MIME encoded-word. This is a mechanism defined in RFC2047 for encoding headers that contain non-ASCII text such that the encoded output contains only ASCII characters.
In Python 3.3+, the parsing classes and functions in email.parser automatically decode "encoded words" in headers if their policy
argument is set to policy.default
>>> import email
>>> from email import policy
>>> msg = email.message_from_file(open('message.txt'), policy=policy.default)
>>> msg['from']
'Pepé Le Pew <[email protected]>'
The parsing classes and functions are:
- email.parser.BytesParser
- email.parser.Parser
- email.message_from_bytes
- email.message_from_binary_file
- email.message_from_string
- email.message_from_file
Confusingly, up to at least Python 3.8, the default policy for these parsing functions is not policy.default
, but policy.compat32
, which does not decode "encoded words".
>>> msg = email.message_from_file(open('message.txt'))
>>> msg['from']
'=?utf-8?q?Pep=C3=A9?= Le Pew <[email protected]>'
Solution 4
Try Imbox
Because imaplib
is a very excessive low level library and returns results which are hard to work with
Installation
pip install imbox
Usage
from imbox import Imbox
with Imbox('imap.gmail.com',
username='username',
password='password',
ssl=True,
ssl_context=None,
starttls=False) as imbox:
all_inbox_messages = imbox.messages()
for uid, message in all_inbox_messages:
message.subject
Solution 5
In Python 3, decoding this to an approximated string is as easy as:
from email.header import decode_header, make_header
decoded = str(make_header(decode_header("=?utf-8?Q?Subject?=")))
See the documentation of decode_header
and make_header
.
janeh
Updated on July 09, 2022Comments
-
janeh almost 2 years
I am displaying new email with
IMAP
, and everything looks fine, except for one message subject shows as:=?utf-8?Q?Subject?=
How can I fix it?
-
phihag about 8 yearsIn both Python 2 and Python 3,
email.header.decode_header
(with lower-casem
) is the generic name. In addition, in your code,text
is not actually a text, but instead a bytes variable. -
Anatoly Alekseev over 5 years+1 truly this is for humans. Indeed imbox was able to decode otherwise base64-encoded (in imaplib and the like) subject and other fields on-the-fly. However, be aware if some field is missing the KeyError will be thrown.
-
wbg over 5 yearsCould you rewrite that in a more Pythonic fashion?
-
phihag over 5 years@wbg What's not Pythonic about this code? What would you change? Looking at it now, it seems rather well-written to me, and a paragon of Python's expressiveness. Maybe the generator expression is tripping up @deterjan? If you're just targeting Python 3, you can skip the
if isinstance(word, bytes) else word
and theu
before the'
; this code has been engineered to work on both Python 2 and 3.