Should plus be encoded in mailto: hyperlinks?
Solution 1
The plus is used to encode spaces in URLs, not in HTML and not in SMTP (RFC2821). However, since mailto:[email protected]
is a URI (it has a protocol, the protocol separator and the protocol address) then it should be treated as a URI and it should be percent encoded.
Therefore, it is up to the client to resolve accurately the encoded representation and to decode it as far as is appropriate. Here is Microsoft's official take on the matter.
You should apply URL encoding on mailto: URLs embedded in HTML if the characters in the email address are URI reserved. This ensures that you are doing the correct thing. It is up to the client to decode the URI appropriately from whence it is received. Yes, [email protected]
is a very valid email; yes this%[email protected]
is also valid. Yes those two are different, but whether they'll be treated differently is up to the client...
As you previously noted, not all clients render this correctly. I suggest finding the most likely client (gmail? browser based clients? Outlook?) that your users will use and doing what that client does. You said you tested on GMail? How did you test it? With a "browser based mailto: client (such as add-ons to firefox and gmail offer) the URI is most likely not being decoded (as it should be).
Solution 2
A strict reading of the relevant RFC says that the "+" should be encoded.
Section 2, top of page 2 on https://www.rfc-editor.org/rfc/rfc2368 says:
"Note that all URL reserved characters in "to" must be encoded: in particular, parentheses, commas, and the percent sign ("%"), which commonly occur in the "mailbox" syntax."
The RFC for URIs (https://www.rfc-editor.org/rfc/rfc3986#section-2.2) lists "+" as a reserved character.
That said, what is "correct" is not necessarily what will work in all browsers. Some browsers will obviously always handle the correct things as if they were wrong and the incorrect as if they were right.
Edit: As for RFC6068 and its "MAY", I would read that as context dependent. If you are writing the URL for text reading then "+" would make more sense, however if you're writing it in HTML then the stricter interpretation of RFC3986 would be more inline with "valid HTML" ideas and so anything using the value should expect it to be encoded.
Solution 3
You MAY encode +
, but you don't have to.
First, we need to agree that mailto
is an example of a generic URI, specified by RFC 2396. (This is what XHTML and HTML 4 use).
Now let us find out the list of reserved characters in RFC 2396.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
URI splits into absolute and relative:
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
And because scheme mailto:
is specified this is an absolute URI:
absoluteURI = scheme ":" ( hier_part | opaque_part )
And since both patterns for hier_part
start with /
, mailto
is an opaque part.
opaque_part = uric_no_slash *uric
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
"&" | "=" | "+" | "$" | ","
uric = reserved | unreserved | escaped
So the restriction is that you have to escape /
if it comes to the first character, but after that you can put in reserved characters including +
and @
.
Here's another RFC to support this. In the latest RFCs of mailto scheme published in 2010 called RFC 6068, it says:
Software creating
'mailto'
URIs likewise has to be careful to encode any reserved characters that are used. HTML forms are one kind of software that creates'mailto'
URIs. Current implementations encode a space as'+'
, but this creates problems because such a'+'
standing for a space cannot be distinguished from a real'+'
in a'mailto'
URI. When producing'mailto'
URIs, all spaces SHOULD be encoded as%20
, and'+'
characters MAY be encoded as%2B
. Please note that'+'
characters are frequently used as part of an email address to indicate a subaddress, as for example in<[email protected]>
.
Solution 4
Per new RFC https://www.rfc-editor.org/rfc/rfc6068#section-5
... '+' MAY BE encoded as %2B
So I guess the answer is don't, but maybe?
Solution 5
The RFC1738
3.5. MAILTO
The mailto URL scheme is used to designate the Internet mailing address of an individual or service. No additional information other than an Internet mailing address is present or implied.
A mailto URL takes the form:
mailto:<rfc822-addr-spec>
where is (the encoding of an) addr-spec, as specified in RFC 822. Within mailto URLs, there are no reserved characters.
Note that the percent sign ("%") is commonly used within RFC 822 addresses and must be encoded.
Unlike many URLs, the mailto scheme does not represent a data object to be accessed directly; there is no sense in which it designates an object. It has a different use than the message/external-body type in MIME.
Since there are no reserved characters it should be encoded.
Related videos on Youtube
McDowell
Stack Overflow Valued Associate #00001 Wondering how our software development process works? Take a look! Find me on twitter, or read my blog. Don't say I didn't warn you because I totally did. However, I no longer work at Stack Exchange, Inc. I'll miss you all. Well, some of you, anyway. :)
Updated on September 18, 2022Comments
-
McDowell over 1 year
When placing an email address with an address tag (aka sub-addressing) in a mailto hyperlink …
<a href="mailto:[email protected]">mail us now!</a>
… should the plus in the email be URL encoded?
<a href="mailto:username%[email protected]">mail us now!</a>
I can't figure this out, and the documentation is conflicting. Our real world tests have produced mixed results as well, making it even more confusing.
-
Admin almost 13 yearsCan you be more specific on the methods and results of your real-world tests? Do some email clients/services treat it properly and others choke? Can you be more specific?
-
Admin almost 13 years@bryson I know the "send using gmail" chrome extension has had issues with unencoded plus in the mailto: for example, but perhaps that's a bug.
-
Admin almost 13 yearsJust use whichever one works with chrome.
-
-
McDowell almost 13 yearstrue, good point that there is some variance on email sub-addressing -- but the emails in this case are gmail hosted so I know the plus is correct and will work when received by the server, assuming the email gets through the client.
-
jcolebrand almost 13 yearsThe problem is the application parsing the URI request. If it expects to receive URLEncoded data then it will decode the data, but that is neither fair to you (to falsely encode) nor to the client (to make assumptions). The protocol does not dictate the encoding expected, the client does. See the further edits I make to the A by @Wez
-
Campbeln almost 13 yearsDoes anyone have any actual data on what works where?
-
McDowell almost 13 yearsand yet per tools.ietf.org/html/rfc6068 "When producing 'mailto' URIs, all spaces SHOULD be encoded as %20, and '+' characters MAY be encoded as %2B"
-
jcolebrand almost 13 years
Since there are no reserved characters it should be encoded.
ummmm that doesn't make any sense. -
cypherabe almost 13 years@jcolebrand '+' is a special character in the URL scheme and thus must encoded when it does not have a special role - ie. when it is not reserved.
-
cypherabe almost 13 years@Jeff Indeed - my bad for living in an older RFC world. Then tools.ietf.org/html/rfc2119 basically tells you to do what you feel fits you best.
-
jcolebrand almost 13 yearsthat seems .... backwards in spirit to the way I read the instructions initially.
-
jcolebrand almost 13 yearswell I did make a specific note of what Microsoft affirms works...
-
jcolebrand almost 13 yearsI am not entirely familiar with that grammar, however, it lists the characters as separate from the unreserved pool, which indicates that + is a reserved character. It does not indicate that it must be encoded. Microsoft says to encode it. C'est la vie, I wait to see.
-
Eugene Yokota almost 13 yearsWhen a part does not start with
/
,+
no longer becomes a reserved character. -
jcolebrand almost 13 yearsI disagree. "email addresses" are very peculiarly defined, and must be treated with some care in the first place. That standard is very confusing. Fortunately, we get to disagree here.
-
Matthew Read almost 13 yearsThis is spot on. Gmail doesn't handle them correctly, but since Google ignores user bug reports there's not much you can do about it.
-
Eugene Yokota almost 13 yearsIf you have encode
+
in URI,@
also needs to be encoded because it's also a reserved character. If you read the RFC carefully, you will find out that in a opaque part,+
is legal. -
Eugene Yokota almost 13 yearsIn RFC 3986,
mailto
would be treated aspath-rootless
, which allows sequence ofpchar
defined by(unreserved / pct-encoded / sub-delims / ":" / "@")
.+
is part ofsub-delims
. So strict reading says+
does not require percent encoding. -
Maciej Piechotka almost 13 yearsI may be wrong but isn't it reserved to separate username from host (like in [email protected]/path )? Then it would make its place in the address as it does separate the username from host.
-
RachitSharma about 11 yearsAt this time, Lotus Notes (no comment) does not like an unencoded plus in an address (it will substitute an underscore) but does work with %2B. ie: [email protected] does not work (it will compose to [email protected]) but a%[email protected] does work. Either of those will work in gmail, but if you encode the @ (like a%2Bb%40example.com), gmail will not fill the To address at all.