How convert email subject from "?UTF-8?...?=" to readable string?
15,494
Solution 1
The part between =?UTF-8?B?
and ?=
is a base64-encoded string. Extract that part, and then decode it.
import base64
#My buggy SSH account needs this to write unicode output, you hopefully won't
import sys
import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
encoded = '=?UTF-8?B?0J/RgNC+0LLQtdGA0LrQsA==?='
prefix = '=?UTF-8?B?'
suffix = '?='
#extract the data part of the string
middle = encoded[len(prefix):len(encoded)-len(suffix)]
print "Middle: %s" % middle
#decode the bytes
decoded = base64.b64decode(middle)
#decode the utf-8
decoded = unicode(decoded, 'utf8')
print "Decoded: %s" % decoded
Output:
Middle: 0J/RgNC+0LLQtdGA0LrQsA==
Decoded: Проверка
Solution 2
Maybe you can use decode_header function: http://docs.python.org/library/email.header.html#email.header.decode_header
Author by
anton
Updated on June 11, 2022Comments
-
anton almost 2 years
Possible Duplicate:
string encode / decodeNow the subject looks like: =?UTF-8?B?0J/RgNC+0LLQtdGA0LrQsA==?=
-
anton about 13 yearsThanks for replying, but result is not good: [('\xd0\x9f\xd1\x80\xd0\xbe\xd0\xb2\xd0\xb5\xd1\x80\xd0\xba\xd0\xb0', 'utf-8')]
-
gnud about 13 yearsYou can convert that result into a unicode string, by using
unicode(*result[0])
. -
Ignacio Vazquez-Abrams about 13 yearsSo much work to replace 2 lines of correct code...
-
gnud about 13 yearsYes, using
email.header.decode_header
seem like a better start, instead of my substring mess. I still explained what was going on though, and how to convert the result from decode_header to a unicode string. -
Gert van den Berg almost 8 yearsWhat standard would this UTF-8 subjects be based on?