Decoding UTF-8 email subject?
Solution 1
In MIME terminology, those encoded chunks are called encoded-words. Check out javax.mail.internet.MimeUtility.decodeText
in JavaMail. The decodeText
method will decode all the encoded-words in a string.
You can grab it from maven with
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4.4</version>
Solution 2
MimeUtility.decodeText is working for me,
eg,
MimeUtility.decodeText("=?UTF-8?B?4K6q4K+N4K6q4K+K4K604K6/4K614K+BIQ==?=");
Solution 3
javax.mail.internet.MimeUtility.decodeWord()
On the other hand, if you use JavaMail for decoding your emails, you don't have to care about either subject parsing or MIME body (attachments) parsing at all.
BTW it does not need to be Base64 (common with Apple's clients), it can also be Quoted-Printable (common with MS Outlook client).
Thunderbird uses whichever format is shorter (Base64 for Japanese, QP for most European languages).
If you really want to implement it yourself, have a look at RFC2047 and RFC2184 (you have to, there are a few subtleties like split encoding in two different character sets or merging adjacent encoded words only separated by folding white space)
Stefan Kendall
Updated on June 04, 2022Comments
-
Stefan Kendall almost 2 years
I have a string in this form:
=?utf-8?B?zr...
And I want to get the name of the file in proper UTF-8 encoding. Is there a library method somewhere in maven central that will do this decoding for me, or will I need to test the pattern and decode base64 manually?
-
Drizzt321 almost 7 yearsI'll add that I had this problem with a name (similar to mathi's answer below) on the MIME Part "Content-Disposition attachment;filename=" the filename has "=?utf-8?B?" prefix which is this special encoded-words.