Should plus be encoded in mailto: hyperlinks?

10,907

Solution 1

The plus is used to encode spaces in URLs, not in HTML and not in SMTP (RFC2821). However, since mailto:[email protected] is a URI (it has a protocol, the protocol separator and the protocol address) then it should be treated as a URI and it should be percent encoded.

Therefore, it is up to the client to resolve accurately the encoded representation and to decode it as far as is appropriate. Here is Microsoft's official take on the matter.

You should apply URL encoding on mailto: URLs embedded in HTML if the characters in the email address are URI reserved. This ensures that you are doing the correct thing. It is up to the client to decode the URI appropriately from whence it is received. Yes, [email protected] is a very valid email; yes this%[email protected] is also valid. Yes those two are different, but whether they'll be treated differently is up to the client...

As you previously noted, not all clients render this correctly. I suggest finding the most likely client (gmail? browser based clients? Outlook?) that your users will use and doing what that client does. You said you tested on GMail? How did you test it? With a "browser based mailto: client (such as add-ons to firefox and gmail offer) the URI is most likely not being decoded (as it should be).

Solution 2

A strict reading of the relevant RFC says that the "+" should be encoded.

Section 2, top of page 2 on https://www.rfc-editor.org/rfc/rfc2368 says:

"Note that all URL reserved characters in "to" must be encoded: in particular, parentheses, commas, and the percent sign ("%"), which commonly occur in the "mailbox" syntax."

The RFC for URIs (https://www.rfc-editor.org/rfc/rfc3986#section-2.2) lists "+" as a reserved character.

That said, what is "correct" is not necessarily what will work in all browsers. Some browsers will obviously always handle the correct things as if they were wrong and the incorrect as if they were right.

Edit: As for RFC6068 and its "MAY", I would read that as context dependent. If you are writing the URL for text reading then "+" would make more sense, however if you're writing it in HTML then the stricter interpretation of RFC3986 would be more inline with "valid HTML" ideas and so anything using the value should expect it to be encoded.

Solution 3

You MAY encode +, but you don't have to.

First, we need to agree that mailto is an example of a generic URI, specified by RFC 2396. (This is what XHTML and HTML 4 use).

Now let us find out the list of reserved characters in RFC 2396.

reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
              "$" | ","

URI splits into absolute and relative:

URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]

And because scheme mailto: is specified this is an absolute URI:

absoluteURI   = scheme ":" ( hier_part | opaque_part )

And since both patterns for hier_part start with /, mailto is an opaque part.

opaque_part   = uric_no_slash *uric

uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
                "&" | "=" | "+" | "$" | ","

uric          = reserved | unreserved | escaped

So the restriction is that you have to escape / if it comes to the first character, but after that you can put in reserved characters including + and @.

Here's another RFC to support this. In the latest RFCs of mailto scheme published in 2010 called RFC 6068, it says:

Software creating 'mailto' URIs likewise has to be careful to encode any reserved characters that are used. HTML forms are one kind of software that creates 'mailto' URIs. Current implementations encode a space as '+', but this creates problems because such a '+' standing for a space cannot be distinguished from a real '+' in a 'mailto' URI. When producing 'mailto' URIs, all spaces SHOULD be encoded as %20, and '+' characters MAY be encoded as %2B. Please note that '+' characters are frequently used as part of an email address to indicate a subaddress, as for example in <[email protected]>.

Solution 4

Per new RFC https://www.rfc-editor.org/rfc/rfc6068#section-5

  ... '+' MAY BE encoded as %2B

So I guess the answer is don't, but maybe?

Solution 5

The RFC1738

3.5. MAILTO

The mailto URL scheme is used to designate the Internet mailing address of an individual or service. No additional information other than an Internet mailing address is present or implied.

A mailto URL takes the form:

    mailto:<rfc822-addr-spec>

where is (the encoding of an) addr-spec, as specified in RFC 822. Within mailto URLs, there are no reserved characters.

Note that the percent sign ("%") is commonly used within RFC 822 addresses and must be encoded.

Unlike many URLs, the mailto scheme does not represent a data object to be accessed directly; there is no sense in which it designates an object. It has a different use than the message/external-body type in MIME.

Since there are no reserved characters it should be encoded.

Share:
10,907

Related videos on Youtube

McDowell
Author by

McDowell

Stack Overflow Valued Associate #00001 Wondering how our software development process works? Take a look! Find me on twitter, or read my blog. Don't say I didn't warn you because I totally did. However, I no longer work at Stack Exchange, Inc. I'll miss you all. Well, some of you, anyway. :)

Updated on September 18, 2022

Comments

  • McDowell
    McDowell over 1 year

    When placing an email address with an address tag (aka sub-addressing) in a mailto hyperlink …

    <a href="mailto:[email protected]">mail us now!</a>

    … should the plus in the email be URL encoded?

    <a href="mailto:username%[email protected]">mail us now!</a>

    I can't figure this out, and the documentation is conflicting. Our real world tests have produced mixed results as well, making it even more confusing.

    • Admin
      Admin almost 13 years
      Can you be more specific on the methods and results of your real-world tests? Do some email clients/services treat it properly and others choke? Can you be more specific?
    • Admin
      Admin almost 13 years
      @bryson I know the "send using gmail" chrome extension has had issues with unencoded plus in the mailto: for example, but perhaps that's a bug.
    • Admin
      Admin almost 13 years
      Just use whichever one works with chrome.
  • McDowell
    McDowell almost 13 years
    true, good point that there is some variance on email sub-addressing -- but the emails in this case are gmail hosted so I know the plus is correct and will work when received by the server, assuming the email gets through the client.
  • jcolebrand
    jcolebrand almost 13 years
    The problem is the application parsing the URI request. If it expects to receive URLEncoded data then it will decode the data, but that is neither fair to you (to falsely encode) nor to the client (to make assumptions). The protocol does not dictate the encoding expected, the client does. See the further edits I make to the A by @Wez
  • Campbeln
    Campbeln almost 13 years
    Does anyone have any actual data on what works where?
  • McDowell
    McDowell almost 13 years
    and yet per tools.ietf.org/html/rfc6068 "When producing 'mailto' URIs, all spaces SHOULD be encoded as %20, and '+' characters MAY be encoded as %2B"
  • jcolebrand
    jcolebrand almost 13 years
    Since there are no reserved characters it should be encoded. ummmm that doesn't make any sense.
  • cypherabe
    cypherabe almost 13 years
    @jcolebrand '+' is a special character in the URL scheme and thus must encoded when it does not have a special role - ie. when it is not reserved.
  • cypherabe
    cypherabe almost 13 years
    @Jeff Indeed - my bad for living in an older RFC world. Then tools.ietf.org/html/rfc2119 basically tells you to do what you feel fits you best.
  • jcolebrand
    jcolebrand almost 13 years
    that seems .... backwards in spirit to the way I read the instructions initially.
  • jcolebrand
    jcolebrand almost 13 years
    well I did make a specific note of what Microsoft affirms works...
  • jcolebrand
    jcolebrand almost 13 years
    I am not entirely familiar with that grammar, however, it lists the characters as separate from the unreserved pool, which indicates that + is a reserved character. It does not indicate that it must be encoded. Microsoft says to encode it. C'est la vie, I wait to see.
  • Eugene Yokota
    Eugene Yokota almost 13 years
    When a part does not start with /, + no longer becomes a reserved character.
  • jcolebrand
    jcolebrand almost 13 years
    I disagree. "email addresses" are very peculiarly defined, and must be treated with some care in the first place. That standard is very confusing. Fortunately, we get to disagree here.
  • Matthew Read
    Matthew Read almost 13 years
    This is spot on. Gmail doesn't handle them correctly, but since Google ignores user bug reports there's not much you can do about it.
  • Eugene Yokota
    Eugene Yokota almost 13 years
    If you have encode + in URI, @ also needs to be encoded because it's also a reserved character. If you read the RFC carefully, you will find out that in a opaque part, + is legal.
  • Eugene Yokota
    Eugene Yokota almost 13 years
    In RFC 3986, mailto would be treated as path-rootless, which allows sequence of pchar defined by (unreserved / pct-encoded / sub-delims / ":" / "@"). + is part of sub-delims. So strict reading says + does not require percent encoding.
  • Maciej Piechotka
    Maciej Piechotka almost 13 years
    I may be wrong but isn't it reserved to separate username from host (like in [email protected]/path )? Then it would make its place in the address as it does separate the username from host.
  • RachitSharma
    RachitSharma about 11 years
    At this time, Lotus Notes (no comment) does not like an unencoded plus in an address (it will substitute an underscore) but does work with %2B. ie: [email protected] does not work (it will compose to [email protected]) but a%[email protected] does work. Either of those will work in gmail, but if you encode the @ (like a%2Bb%40example.com), gmail will not fill the To address at all.