Decode an UTF8 email header
Solution 1
The encoded-word
tokens (as per RFC 2047) can occur in values of some headers. They are parsed as follows:
=?<charset>?<encoding>?<data>?=
Charset is UTF-8 in this case, the encoding is B
which means base64 (the other option is Q
which means Quoted Printable).
To read it, first decode the base64, then treat it as UTF-8 characters.
Also read the various Internet Mail RFCs for more detail, mainly RFC 2047.
Since you are using Perl, Encode::MIME::Header could be of use:
SYNOPSIS
use Encode qw/encode decode/; $utf8 = decode('MIME-Header', $header); $header = encode('MIME-Header', $utf8);
ABSTRACT
This module implements RFC 2047 Mime Header Encoding. There are 3 variant encoding names; MIME-Header, MIME-B and MIME-Q. The difference is described below
decode() encode() MIME-Header Both B and Q =?UTF-8?B?....?= MIME-B B only; Q croaks =?UTF-8?B?....?= MIME-Q Q only; B croaks =?UTF-8?Q?....?=
Solution 2
I think that the Encode module handles that with the MIME-Header
encoding, so try this:
use Encode qw(decode);
my $decoded = decode("MIME-Header", $encoded);
Solution 3
Check out RFC2047. The 'B' means that the part between the last two '?'s is base64-encoded. The 'utf-8' naturally means that the decoded data should be interpreted as UTF-8.
Solution 4
MIME::Words from MIME-tools work well too for this. I ran into some issue with Encode and found MIME::Words succeeded on some strings where Encode did not.
use MIME::Words qw(:all);
$decoded = decode_mimewords(
'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <[email protected]>',
);
Solution 5
This is a standard extension for charset labeling of headers, specified in RFC2047.
Related videos on Youtube
CoffeeMonster
Updated on July 09, 2022Comments
-
CoffeeMonster almost 2 years
I have an email subject of the form:
=?utf-8?B?T3.....?=
The body of the email is utf-8 base64 encoded - and has decoded fine. I am current using Perl's Email::MIME module to decode the email.
What is the meaning of the =?utf-8 delimiter and how do I extract information from this string?
-
kagali-san over 13 yearsThat was helpful, thanks. Btw, I also used print encode('utf-8', $headers_decoded) to display decoded headers properly, if someone else is reading this while writing some mail script.