Decoding UTF-8 email subject?

10,960

Solution 1

In MIME terminology, those encoded chunks are called encoded-words. Check out javax.mail.internet.MimeUtility.decodeText in JavaMail. The decodeText method will decode all the encoded-words in a string.

You can grab it from maven with

 <groupId>javax.mail</groupId>
 <artifactId>mail</artifactId>
 <version>1.4.4</version>

Solution 2

MimeUtility.decodeText is working for me,

eg,

MimeUtility.decodeText("=?UTF-8?B?4K6q4K+N4K6q4K+K4K604K6/4K614K+BIQ==?=");

Solution 3

javax.mail.internet.MimeUtility.decodeWord()

On the other hand, if you use JavaMail for decoding your emails, you don't have to care about either subject parsing or MIME body (attachments) parsing at all.

BTW it does not need to be Base64 (common with Apple's clients), it can also be Quoted-Printable (common with MS Outlook client).

Thunderbird uses whichever format is shorter (Base64 for Japanese, QP for most European languages).

If you really want to implement it yourself, have a look at RFC2047 and RFC2184 (you have to, there are a few subtleties like split encoding in two different character sets or merging adjacent encoded words only separated by folding white space)

Share:
10,960
Stefan Kendall
Author by

Stefan Kendall

Updated on June 04, 2022

Comments

  • Stefan Kendall
    Stefan Kendall almost 2 years

    I have a string in this form: =?utf-8?B?zr...

    And I want to get the name of the file in proper UTF-8 encoding. Is there a library method somewhere in maven central that will do this decoding for me, or will I need to test the pattern and decode base64 manually?

  • Drizzt321
    Drizzt321 almost 7 years
    I'll add that I had this problem with a name (similar to mathi's answer below) on the MIME Part "Content-Disposition attachment;filename=" the filename has "=?utf-8?B?" prefix which is this special encoded-words.