What characters are allowed in an email address?

1,058,201

Solution 1

See RFC 5322: Internet Message Format and, to a lesser extent, RFC 5321: Simple Mail Transfer Protocol.

RFC 822 also covers email addresses, but it deals mostly with its structure:

 addr-spec   =  local-part "@" domain        ; global address     
 local-part  =  word *("." word)             ; uninterpreted
                                             ; case-preserved
 
 domain      =  sub-domain *("." sub-domain)     
 sub-domain  =  domain-ref / domain-literal     
 domain-ref  =  atom                         ; symbolic reference

And as usual, Wikipedia has a decent article on email addresses:

The local-part of the email address may use any of these ASCII characters:

  • uppercase and lowercase Latin letters A to Z and a to z;
  • digits 0 to 9;
  • special characters !#$%&'*+-/=?^_`{|}~;
  • dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. [email protected] is not allowed but "John..Doe"@example.com is allowed);
  • space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);
  • comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)[email protected] are both equivalent to [email protected].

In addition to ASCII characters, as of 2012 you can use international characters above U+007F, encoded as UTF-8 as described in the RFC 6532 spec and explained on Wikipedia. Note that as of 2019, these standards are still marked as Proposed, but are being rolled out slowly. The changes in this spec essentially added international characters as valid alphanumeric characters (atext) without affecting the rules on allowed & restricted special characters like !# and @:.

For validation, see Using a regular expression to validate an email address.

The domain part is defined as follows:

The Internet standards (Request for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters a through z (in a case-insensitive manner), the digits 0 through 9, and the hyphen (-). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or blank spaces are permitted.

Solution 2

Watch out! There is a bunch of knowledge rot in this thread (stuff that used to be true and now isn't).

To avoid false-positive rejections of actual email addresses in the current and future world, and from anywhere in the world, you need to know at least the high-level concept of RFC 3490, "Internationalizing Domain Names in Applications (IDNA)". I know folks in US and A often aren't up on this, but it's already in widespread and rapidly increasing use around the world (mainly the non-English dominated parts).

The gist is that you can now use addresses like mason@日本.com and wildwezyr@fahrvergnügen.net. No, this isn't yet compatible with everything out there (as many have lamented above, even simple qmail-style +ident addresses are often wrongly rejected). But there is an RFC, there's a spec, it's now backed by the IETF and ICANN, and--more importantly--there's a large and growing number of implementations supporting this improvement that are currently in service.

I didn't know much about this development myself until I moved back to Japan and started seeing email addresses like hei@やる.ca and Amazon URLs like this:

http://www.amazon.co.jp/エレクトロニクス-デジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie=UTF8&node=3210981

I know you don't want links to specs, but if you rely solely on the outdated knowledge of hackers on Internet forums, your email validator will end up rejecting email addresses that non-English-speaking users increasingly expect to work. For those users, such validation will be just as annoying as the commonplace brain-dead form that we all hate, the one that can't handle a + or a three-part domain name or whatever.

So I'm not saying it's not a hassle, but the full list of characters "allowed under some/any/none conditions" is (nearly) all characters in all languages. If you want to "accept all valid email addresses (and many invalid too)" then you have to take IDN into account, which basically makes a character-based approach useless (sorry), unless you first convert the internationalized email addresses (dead since September 2015, used to be like this—a working alternative is here) to Punycode.

After doing that you can follow (most of) the advice above.

Solution 3

The format of e-mail address is: local-part@domain-part (max. 64@255 characters, no more 256 in total).

The local-part and domain-part could have different set of permitted characters, but that's not all, as there are more rules to it.

In general, the local part can have these ASCII characters:

  • lowercase Latin letters: abcdefghijklmnopqrstuvwxyz,
  • uppercase Latin letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ,
  • digits: 0123456789,
  • special characters: !#$%&'*+-/=?^_`{|}~,
  • dot: . (not first or last character or repeated unless quoted),
  • space punctuations such as: "(),:;<>@[\] (with some restrictions),
  • comments: () (are allowed within parentheses, e.g. (comment)[email protected]).

Domain part:

  • lowercase Latin letters: abcdefghijklmnopqrstuvwxyz,
  • uppercase Latin letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ,
  • digits: 0123456789,
  • hyphen: - (not first or last character),
  • can contain IP address surrounded by square brackets: jsmith@[192.168.2.1] or jsmith@[IPv6:2001:db8::1].

These e-mail addresses are valid:

And these examples of invalid:

  • Abc.example.com (no @ character)
  • A@b@[email protected] (only one @ is allowed outside quotation marks)
  • a"b(c)d,e:f;gi[j\k][email protected] (none of the special characters in this local part are allowed outside quotation marks)
  • just"not"[email protected] (quoted strings must be dot separated or the only element making up the local part)
  • this is"not\[email protected] (spaces, quotes, and backslashes may only exist when within quoted strings and preceded by a backslash)
  • this\ still\"not\[email protected] (even if escaped (preceded by a backslash), spaces, quotes, and backslashes must still be contained by quotes)
  • [email protected] (double dot before @); (with caveat: Gmail lets this through)
  • [email protected] (double dot after @)
  • a valid address with a leading space
  • a valid address with a trailing space

Source: Email address at Wikipedia


Perl's RFC2822 regex for validating emails:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

The full regexp for RFC2822 addresses was a mere 3.7k.

See also: RFC 822 Email Address Parser in PHP.


The formal definitions of e-mail addresses are in:

  • RFC 5322 (sections 3.2.3 and 3.4.1, obsoletes RFC 2822), RFC 5321, RFC 3696,
  • RFC 6531 (permitted characters).

Related:

Solution 4

Wikipedia has a good article on this, and the official spec is here. From Wikipdia:

The local-part of the e-mail address may use any of these ASCII characters:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 to 9
  • Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.

Additionally, quoted-strings (ie: "John Doe"@example.com) are permitted, thus allowing characters that would otherwise be prohibited, however they do not appear in common practice. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

Solution 5

You can start from wikipedia article:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 to 9
  • Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
Share:
1,058,201
WildWezyr
Author by

WildWezyr

Coder, System Architect, Team Leader, etc. Main areas of interest: Java SQL Frameworks/APIs desing and implementation Language design and implementation Language paradigms Code simplicity and efficiency Real life applications of KISS, DRY and similar rules ;-)

Updated on April 03, 2022

Comments

  • WildWezyr
    WildWezyr about 2 years

    I'm not asking about full email validation.

    I just want to know what are allowed characters in user-name and server parts of email address. This may be oversimplified, maybe email adresses can take other forms, but I don't care. I'm asking about only this simple form: user-name@server (e.g. [email protected]) and allowed characters in both parts.

    • Dan Herbert
      Dan Herbert over 14 years
      The + is allowed. It drives me nuts when web sites don't allow it because my email has a + in it and so many sites don't allow it.
    • WildWezyr
      WildWezyr over 14 years
      I've just started a bounty. There are already good answers but they do not explain characters allowed in server part of email address. I will accept full answer for my questions (username and server parts explained).
    • Admin
      Admin over 12 years
      Maybe also RFC2821 and RFC2822.
    • John Y
      John Y almost 12 years
      Earlier question covering the same material: stackoverflow.com/questions/760150/. The sad thing is, even though that question is almost 8 months older than this one, the older question has much better answers. Almost all the answers below were already out of date when they were originally posted. See Wikipedia entry (and don't worry, it has relevant official references).
    • Geo
      Geo over 11 years
      According to PHP's fitler_var() validation this email would be correct: _.-+~^*'`{GEO}`'*^[email protected]
    • user253751
      user253751 almost 10 years
      Contrary to several answers, spaces are allowed in the local part of email addresses, if quoted. "hello world"@example.com is valid.
    • Lara Ruffle Coles
      Lara Ruffle Coles almost 8 years
      Currently setting up a Google Dev Console email group, Google doesn't allow the + even though the email address must have been allowed when the person created the Gmail account. !!!!!
    • Kevin Fegan
      Kevin Fegan almost 8 years
      @LaraRuffleColes - For Gmail, when you create an email account, it doesn't allow you to create addresses containing a "+" sign. The "+" sign ("Plus-addressing") allows anyone with a Gmail address to add a "+" sign followed by a "string" to the end of their username to create an "alternate" ("alias") email address to use for their account. Example: "[email protected]", "[email protected]". A typical (and probably "Primary") use of this is to be able to create alias email addresses for your account which allow you to tag and filter incoming email messages, theoretically filtered by sender.
    • Patrick O'Hara
      Patrick O'Hara over 6 years
      Think the '+' drives you nuts? My last name has an apostrophe in it. Know haw many websites I can still crash by entering my last name? Way too many, but on topic I gave up the email address Patrick.o'hara because almost no one allows it, thought it is valid.
    • oligan
      oligan about 6 years
      @DanHerbert Maybe they don't want people easily abusing the system by using a single real email address to create multiple accounts.
    • Dan Herbert
      Dan Herbert about 6 years
      @Andrew The reverse is much more common. If a site can't be trusted to allow proper email addresses, I don't trust them to handle my personal information.
    • Amir Hassan Azimi
      Amir Hassan Azimi almost 3 years
      @DanHerbert because websites don't want 2 different users with the same email. Imagine they provide discounts for the first-time buyers and every time you shop you could claim you're a new customer just by adding a + gibberish after your email. Would you want that?
    • Dan Herbert
      Dan Herbert almost 3 years
      @HassanAzimi Trying to prevent abuse by blocking valid email address formats is not a great strategy and would stop an incredibly small number of bad actors who can get around that limitation quite easily. Plus, it isn't a universal rule that all email providers ignore everything after a + At the time of my original comment, it was something that only worked that way with Gmail. A lot of the larger providers now behave that way, but it's still not an effective way to stop bad behavior and is going to annoy more honest users than dishonest ones.
    • Amir Hassan Azimi
      Amir Hassan Azimi almost 3 years
      @DanHerbert I tried MSN, AOL, YAHOO and none of them let you add a plus anywhere in your email when creating a new email so yes it is not a valid email address.
    • Luther
      Luther over 2 years
      @DanHerbert a related pet peeve of mine is sites that don't allow a single character local part of the email, e.g. [email protected] is completely valid but many sites don't allow it including some airline sites.
    • Hakanai
      Hakanai over 2 years
      @DanHerbert yikes, I have been using the + trick since before Gmail even existed, on my own mail servers.
    • Christopher Cashell
      Christopher Cashell over 2 years
      @AmirHassanAzimi - That's incorrect and flawed logic. The fact that some websites don't let you create an account with a + character in it does not in any way mean that + "is not a valid e-mail address". They still accept, process, and work with e-mail that have + in it because it is valid. Sites that disallow it are adding a restriction in order to make special use of (valid) e-mail addresses containing +.
    • Amir Hassan Azimi
      Amir Hassan Azimi over 2 years
      @ChristopherCashell try to look at the company's perspective meaning having 1 email means you can forge multiple emails. It is up to you/company to accept that or not but I already explained why it's bad practice.
  • Dan Herbert
    Dan Herbert over 14 years
    @WildWzyr, It's not that simple. Email addresses have a lot of rules for what is allowed. It's simpler to refer to the spec than to list out all of them. If you want the complete Regex, check here to get an idea of why it's not so simple: regular-expressions.info/email.html
  • Admin
    Admin over 14 years
    there is no simple list, just because you want something simple doesn't mean it will be so. some characters can only be in certain locations and not in others. you can't have what you want all the time.
  • Mark Pim
    Mark Pim over 14 years
    @WildWezyr Well, the full-stop character is allowed in the local-part. But not at the start or end. Or with another full-stop. So the answer IS NOT as simple as just a list of allowed characters, there are rules as to how those characters may be used - [email protected] is not a valid email address, but [email protected] is, even though both use the same characters.
  • JensenDied
    JensenDied over 14 years
    @WildWezyr Valid hostnames, which could be an ip address, FQN, or something resolvable to an local network host.
  • Chinmay Kanchi
    Chinmay Kanchi over 14 years
    Also, remember that with internationalized domain names coming in, the list of allowed characters will explode.
  • WildWezyr
    WildWezyr over 14 years
    Are you sure that this extra characters are sent to and handled by servers? As far as I know internationalized domain names are handled by browsers (protocol clients not servers).
  • Mason
    Mason over 14 years
    Right; behind the scenes, the domain names are still just ASCII. But, if your web app or form accepts user-entered input, then it needs to perform the same job that the web browser or mail client does when the user inputs an IDN hostname: to convert the user input into DNS-compatible form. Then validate. Otherwise, these internationalized email addresses will not pass your validation. (Converters like the one I linked to only modify the non-ASCII characters they are given, so it is safe to use them on non-internationalized email addresses (those are just returned unmodified).)
  • ZacharyP
    ZacharyP over 12 years
    This is no longer the valid answer, due to internationalized addresses. See Mason's answer.
  • Mason
    Mason over 12 years
    Real-world anecdote: there are addresses out there that use consecutive dots (in violation of the RFC, I think). This just came up recently when I assisted with a technical audit of a corporate emergency notification system; in an annual drill, the system had silently failed to notify one employee. It turns out that NTT Docomo, Japan's largest cellular carrier, allows email address like "[email protected]". The system was choking on that address. (Docomo has more than 40 million customers.)
  • John Y
    John Y almost 12 years
    @ZacharyP: Actually Mason's answer doesn't go far enough either. UTF-8 is now officially allowed anywhere in the address. (See my comment on the main question.)
  • John Y
    John Y almost 12 years
    You're right that the other answers here have outdated information. And it's not only the domain, the whole address can be UTF-8. (See my comment on the main question for further references.)
  • wwaawaw
    wwaawaw over 11 years
    For Javascript devs, I'm now researching methods of doing this, and Punycode.js seems to be the most complete and polished solution.
  • Don Rhummy
    Don Rhummy over 11 years
    @AntonGogolev Don't the special characters have to appear within quotes in the local part to be valid? So john'[email protected] is INVALID but "john'doe"@place.com is VALID.
  • Fabián
    Fabián almost 11 years
    Those are very annoying characters to read in an email address indeed.
  • Admin
    Admin over 10 years
    Even . isn't strictly necessary; I've heard of at least one case of an email address at a top level domain (specifically ua). The address was <name>@ua -- no dot!
  • hardywang
    hardywang over 10 years
    RFC6530 tools.ietf.org/html/rfc6530 does support international characters. So allowed ones go beyond just standard ASCII.
  • IMSoP
    IMSoP about 10 years
    Note that Internationalized Email (as currently defined) does not convert non-ASCII addresses using punycode or similar, instead extending large portions of the SMTP protocol itself to use UTF8.
  • Piotr Kula
    Piotr Kula almost 10 years
    Not according to Google Mail. imgur.com/hX5W2T7 - I bet Gmail wont even accept emails with apostophes in them and to be hones in 25 years, since the days of Dial Up Buliten boards, I have not once seen an email with an apostophe.
  • Piotr Kula
    Piotr Kula almost 10 years
    Yea this is a great answer about why Gmail does not allow to CREATE emails with this. But you can send and recieve emails from {john'doe}@my.server with no problem. Tested with hMail server too.
  • Piotr Kula
    Piotr Kula almost 10 years
    You can test your client by sending an email to {piotr'kula}@kula.solutions - If it works you will get a nice auto reply form it. Otherwise nothing will happen.
  • Nolan Amy
    Nolan Amy over 9 years
    @ppumkin A given email provider can be as restrictive as they want, but that has no bearing on providers generally.
  • radtek
    radtek over 9 years
    python's smptlib is not allowing a "!" in the local-part of the address, [email protected] will throw SMTPRecipientsRefused (550, 'restricted characters in address')
  • Teemu Leisti
    Teemu Leisti over 9 years
    Gmail does follow RFC 6530 in the sense that every possible e-mail address allowed by Gmail is valid according to the RFC. Gmail just chooses to further restrict the set of allowable addresses with additional rules, and to make otherwise similar addresses with dots in the local part, optionally followed by "+" and alphanumeric characters, synonymous.
  • Noyo
    Noyo over 8 years
    Here's the newest definition, from the HTML5 spec (not an RFC): w3.org/TR/html5/forms.html#valid-e-mail-address .
  • glerYbo
    glerYbo almost 8 years
    What about <> and []? E.g. "()<>[]:,;@\\\"!#$%&'-/=?^_{}| ~.a"@example.org`?
  • Samuel Harmer
    Samuel Harmer over 7 years
    Am I missing something or does this fail to answer the question? I am reading 'the other answer is wrong, you need to accept more characters' but then fails to state which extra characters. I also couldn't (easily) see in that RFC whether it means all Unicode code points or just the BMP.
  • Mathieu K.
    Mathieu K. over 7 years
    Please cite sources. Without sources, this looks like conjecture.
  • Saiyaff Farouk
    Saiyaff Farouk over 7 years
    I was wondering about the '@' before the domain part. Can that be used?
  • Luke Madhanga
    Luke Madhanga over 7 years
    @SaiyaffFarouk according to the specification, yes. However, most mail providers likely won't allow it as part of their own validation
  • Sean
    Sean over 7 years
    This seems to be on the right track to being the correct answer. I bet it would get a lot more votes if you included specifics about reserved and allowed characters.
  • Mau
    Mau about 7 years
    This version improves the regex by checking the length of domain/subdomains. Enjoy! ^[\\w\\.\\!_\\%#\\$\\&\\'=\\?\*\\+\\-\\/\\^\`\\{\\|\\}\\~]+@‌​(?:[\\w](?:[\\w\\-]{‌​0,61}[\\w])?(?:\\.[\‌​\w](?:[\\w\\-]{0,61}‌​[\\w])?)*)$
  • Anentropic
    Anentropic almost 7 years
    Gmail doesn't like the comments john.smith(comment)@example.com
  • Chris Sobolewski
    Chris Sobolewski almost 7 years
    As an extra caution to would-be implementers of this regex: Don't. Just verify that it folows the format [email protected] and call it a day.
  • Jason Harrison
    Jason Harrison over 6 years
    This is out of date, and possibly was never correct.
  • Avamander
    Avamander over 6 years
    This is pretty much the easiest way not to mess up your validation, because almost everything is allowed, and if something isn't allowed, the recipient's server will let you know.
  • BradChesney79
    BradChesney79 over 6 years
    Google limits the account creation criteria... I imagine they scrub the incoming email account string of the extra "punctuation" and trailing plus prepended alias string sign so that the mail can be routed to the proper account. Easy peasy. In doing so, they effectively don't allow people to create just-bein-a-jerk email addresses so that valid addresses created will often pass simple and most complex validations.
  • unjankify
    unjankify over 6 years
    While something like this is not maintainable, it is a nice exercise to decode and actually figure out what it does
  • oligan
    oligan over 6 years
    This post, while useful, and probably old enough that it shouldn't be deleted, feels more like a comment than an answer.
  • John Woo
    John Woo about 6 years
    need to read Mason's answer first before implementing validation otherwise non-english email address will always be rejected. ex 夏明@域通联达。在线
  • Jasen
    Jasen about 6 years
    @ChrisSobolewski allow multiple somethings both sides of the '@'
  • vcarel
    vcarel about 6 years
    I thing this answer is downvoted because this is an opinion, and it actually does not answer the question. Besides, users who get their email address silently sanitized will never get emails from you. You'd better inform them that their email address is not accepted.
  • scoobydoo
    scoobydoo almost 6 years
    I've tried to implement this in postfix via pcre access table under a check_recipient_access restriction, first turning the 3 long pcres (from the linked page) into one line each and topping and tailing thus: /^[...pcre..]$/ DUNNO, then adding a final line /.*/ REJECT, but it still allows through invalid email addresses. Postfix 3.3.0; perl 5, version 26, subversion 1 (v5.26.1).
  • HoldOffHunger
    HoldOffHunger almost 6 years
    I suspect the downvotes are because there are too many ideas here. The disallowed list, while these are useful unit tests, should be prefaced with what is allowed. The programming approach seems relatively fine, but, would probably fit better after you list the specs you're working with, etc.. Sections and mild copy-editing would help. Just my 2cents.
  • BradChesney79
    BradChesney79 almost 6 years
    @vcarel - Oh, absolutely. Front-end user side validation would inform them what rules (available from the tooltip) they were breaking. You are right-- it is an overall opinion. However, the question above is from someone that is asking X for a Y question for sure. This is guidance and it works... not only does it work, it works well. I don't let bullshit email addresses in my systems where I make the decisions.
  • BradChesney79
    BradChesney79 almost 6 years
    @HoldOffHunger I can see that the overall idea is not as coherently expressed as it could be, I may revise on another day where I have more time to better express that. Thanks for the insight.
  • mckenzm
    mckenzm over 5 years
    Quoted strings were essential for passing through a gateway, remember Banyan Vines?
  • mckenzm
    mckenzm over 5 years
    It's not just gmail, Some providers have "relaying filters" that reject certain quoted strings, particularly containing "=" as if they were delimiters. This is to block users from setting up gateways and nesting spam addresses in the private quoted string. "@" is valid but "=@=" is not (considered) valid.
  • ygoe
    ygoe about 5 years
    The link is gone. What content was there?
  • Laszlo Valko
    Laszlo Valko about 5 years
    "(double dot before @); (with caveat: Gmail lets this through)" - no longer true, Gmail now rejects double dot addresses :(
  • tomuxmon
    tomuxmon about 5 years
    Madness I say. Who would ever use it in production. There is a point where regular expression should no longer be used. It is far beyond that point.
  • jimmont
    jimmont almost 5 years
    @wwaawaw (this might or might not be useful) in Chrome 77 and Nodejs v10 creating a URL instance will convert to Punycode (no library required) automatically. For example new URL("https://はじめよう.みんな") outputs "xn--p8j9a0d9c9a.xn--q9jyb4c". I just started digging around and haven't figured out the reverse yet, if possible...simply noticed location.href prints out the correct punycode.
  • John Boe
    John Boe over 4 years
    @Anton Gogolev: you have a mistake in: "!#$%&'*+-/=?^_`{|}~;" tje last character is forbidden in atext.
  • John Boe
    John Boe over 4 years
    Can anyone change the text to remove the ";" from the end of !#$%&'*+-/=?^_`{|}~;
  • TMWP
    TMWP over 4 years
    I am looking an an xml based content management system that uses DITA. If you try to use "/" in even the first part of an email address (before the @), this creates DITA errors that can interfere with publishing the hyperlink to the email address. For compatibility with systems that may need to use the address, you might want to eliminate "/" as an allowed character.
  • wesinat0r
    wesinat0r about 4 years
    that blog lists Joe.\\[email protected] without quotes. Is this actually valid ? It doesn't seem clear given the answers here, but I'm asking because I have seen (very rare) cases of DNS SoA rname email strings that contain backslashes.
  • Synchro
    Synchro almost 4 years
    Something I see a lot is "validate according to RFC822". This isn't actually what's usually needed. RFC822 doesn't define addresses that can be sent to; it defines addresses that can appear in messages, which is not the same thing. Addresses that can be sent to is determined in RFC821 (SMTP) and follow-on standards. In particular this spec does not allow comments, excluding addresses like a@abc(bananas)def.com that are valid RFC822 addresses but can't be sent to. For this reason, many email validators are validating against the wrong thing.
  • paul23
    paul23 about 3 years
    Wait is somename@[123.123.123.ZZ:25] a legal address? According to the regex and fsm it is? Even though the final part of the IP are not numeric.
  • MilMike
    MilMike about 3 years
    @ygoe yeah site is down, Here is the archive version from ~2012 : web.archive.org/web/20120807105804/https://www.remote.org/…
  • ygoe
    ygoe about 3 years
    @MilMike Thank you, from there I found the new URL of that page and edited the answer.
  • rasputino
    rasputino over 2 years
    Is the symbol ª valid for an email address?
  • Maëlan
    Maëlan about 2 years
    The list of characters that this answer gives for the domain part is actually the list of characters allowed in each DNS label (and the constraint about hyphens - not being first or last applies to each label). The domain part is made of one or several DNS labels separated with dots . .