japanese email subject encoding

15,995

Solution 1

I've been dealing with Japanese encodings for almost 20 years and so I can sympathize with your difficulties. Websites that I've worked on send hundreds of emails daily to Japanese customers so I can share with you what's worked for us.

  • First of all, do not use Shift-JIS. I personally receive tons of Japanese emails and almost never are they encoded using Shift-JIS. I think an old (circa Win 98?) version of Outlook Express encoded outgoing mail using Shift-JIS, but nowadays you just don't see it.

  • As you've figured out, you need to use ISO-2022-JP as your encoding for at least anything that goes in the mail header. This includes the Subject, To line, and CC line. UTF-8 will also work in most cases, but it will not work on Yahoo Japan mail, and as you can imagine, many Japanese users use Yahoo Japan mail.

  • You can use UTF-8 in the body of the email, but it is recommended that you base64 encode the UTF-8 encoded Japanese text and put that in the body instead of raw UTF-8 text. However, in practice, I believe that raw UTF-8 text will work fine these days, for the body of the email.

  • As I alluded to above, you need to at least test on Outlook (Exchange), Outlook Express (IMAP/POP3), and Yahoo Japan web mail. Yahoo Japan is the trickiest because I believe they use EUC for the encoding of their web pages, and so you need to follow the correct standards for your emails or they won't work (ISO-2022-JP is the standard for sending Japanese emails).

  • Also, your subject line should not exceed 75 characters per line. That is, 75 characters after you've encoded in ISO-2022-JP and base64, not 75 characters before conversion. If you exceed 75 characters, you need to break your encoded subject into multiple lines, starting with "=?iso-2022-jp?B?" and ending with "?=" on each line. If you don't do this, your subject might get truncated (depending on the email reader, and also the content of your subject text). According to RFC 2047:

"An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used."

  • Here's some sample PHP code to encode the subject:

 // Convert Japanese subject to ISO-2022-JP (JIS is essentially ISO-2022-JP)

 $subject = mb_convert_encoding ($subject, "JIS", "SJIS");

 // Now, base64 encode the subject

 $subject = base64_encode ($subject);

 // Add the encoding markers to the subject

 $subject = "=?iso-2022-jp?B?" . $subject . "?=";

 // Now, $subject can be placed as-is into the raw mail header.
  • See RFC 2047 for a complete description of how to encode your email header.

Solution 2

Check http://en.wikipedia.org/wiki/MIME#Encoded-Word for a description on how to encode header fields in MIME-compliant messages. You seem to be missing a “?=” at the end of your subject.

Solution 3

=?ISO-2022-JP?B?TEXTTEXT...

ISO_2022-JP means that string is encoded in ISO-2022-JP codepage (eg. not Unicode) B means that string is bese64 encoded

In your example, you should just supply your string in ISO-2022-JP instead of Unicode.

Solution 4

something like this should get the job done in python:


#!/usr/bin/python                                                                                                            
# -*- mode: python; coding: utf-8 -*-                                                                                        
import smtplib
from email.MIMEText import MIMEText
from email.Header import Header
from email.Utils import formatdate

def send_from_gmail( from_addr, to_addr, subject, body, password, encoding="iso-2022-jp" ):

    msg = MIMEText(body.encode(encoding), 'plain', encoding)
    msg['Subject'] = Header(subject.encode(encoding), encoding)
    msg['From'] = from_addr
    msg['To'] = to_addr
    msg['Date'] = formatdate()

    s = smtplib.SMTP('smtp.gmail.com', 587)
    s.ehlo(); s.starttls(); s.ehlo()

    s.login(from_addr, password)
    s.sendmail(from_addr, to_addr, msg.as_string())
    s.close()
    return "Sent mail to: %s" % to_addr



if __name__ == "__main__":
    import sys
    for n,item in enumerate(sys.argv):
        sys.argv[n] = sys.argv[n].decode("utf8")

    if len(sys.argv)==6:
        print send_from_gmail( sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5] )
    elif len(sys.argv)==7:
        print send_from_gmail( sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], encoding=sys.argv[6] )
    else:
        raise "SYNTAX: %s <from_addr> <to_addr> <subject> <body> <password> [encoding]"

**blatantly stolen/adapted from:

http://mtokyo.blog9.fc2.com/blog-entry-127.html

Share:
15,995
danijels
Author by

danijels

Senior developer, C# http://stackoverflow.com/jobs/companies/confirmit-as

Updated on June 01, 2022

Comments

  • danijels
    danijels almost 2 years

    Aparently, encoding japanese emails is somewhat challenging, which I am slowly discovering myself. In case there are any experts (even those with limited experience will do), can I please have some guidelines as to how to do it, how to test it and how to verify it?

    Bear in mind that I've never set foot anywhere near Japan, it is simply that the product I'm developing is used there, among other places.

    What (I think) I know so far is following:
    - Japanese emails should be encoded in ISO-2022-JP, Japanese JIS codepage 50220 or possibly SHIFT_JIS codepage 932
    - Email transfer encoding should be set to Base64 for plain text and 7Bit for Html
    - Email subject should be encoded separately to start with "=?ISO-2022-JP?B?" (don't know what this is supposed to mean). I've tried encoding the subject with

    "=?ISO-2022-JP?B?" + Convert.ToBase64String(Encoding.Unicode.GetBytes(subject))
    

    which basically gives the encoded string as expected but it doesn't get presented as any japanese text in an email program
    - I've tested in Outlook 2003, Outlook Express and GMail

    Any help would be greatly appreciated


    Ok, so to post a short update, thanks to the two helpful answers, I've managed to get the right format and encoding. Now, Outlook gives something that resembles the correct subject:
    =?iso-2022-jp?B?6 Japanese test に各々の視点で語ってもらった。 6相当の防水?=

    However, the exact same email in Outlook Express gives subject like this:
    =?iso-2022-jp?B?6 Japanese test 縺ォ蜷・・・隕也せ縺ァ隱槭▲縺ヲ繧ゅi縺」縺溘・ 6逶ク蠖薙・髦イ豌エ?=

    Furthermore, when viewed in the Inbox view in Outlook Express, the email subject is even more weird, like this:
    =?iso-2022-jp?B?6 Japanese test ??????????????? 6???????=

    Gmail seems to be working in the similar fashion to Outlook, which looks correct.

    I just can't get my head around this one.

  • Elijah
    Elijah about 15 years
    Great reply! Not only does Yahoo mail have trouble supporting UTF-8, but most Japanese cell phones still do not support receiving email in UTF-8, so you are stuck with iso-2022-jp
  • Amit Patil
    Amit Patil about 15 years
    UTF-8 is definitely a second-class citizen in Japan — depressingly, as their own standards are bloody terrible. Cell phone web browsers have only just caught up to supporting it and there are still web mail providers that can't understand incoming UTF-8 mail. It's absolutely pathetic.
  • Amit Patil
    Amit Patil about 15 years
    Actually I've found the webmail services and cellphones support Shift-JIS fine. It's the most compact of the available encodings, so we go for that and haven't had any problems yet.
  • si28719e
    si28719e about 15 years
    definitely best to stick with iso-2022-jp, at least for the subject, as this is by far the most widely supported. especially in the case of cell phones. in most cases new phones (especially softbank) now support utf-08 but anything more than a year old will almost certainly not support utf-8. also be careful to stay away from iso-2022-jp-ext. its almost the same as iso-2022-jp but my experience is that the extended characters are very often not supported by many cell phones.