Regex to match only letters

991,645

Solution 1

Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^[a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).

If you want to match other letters than A–Z, you can either add them to the character set: [a-zA-ZäöüßÄÖÜ]. Or you use predefined character classes like the Unicode character property class \p{L} that describes the Unicode characters that are letters.

Solution 2

\p{L} matches anything that is a Unicode letter if you're interested in alphabets beyond the Latin one

Solution 3

Depending on your meaning of "character":

[A-Za-z] - all letters (uppercase and lowercase)

[^0-9] - all non-digit characters

Solution 4

The closest option available is

[\u\l]+

which matches a sequence of uppercase and lowercase letters. However, it is not supported by all editors/languages, so it is probably safer to use

[a-zA-Z]+

as other users suggest

Solution 5

You would use

/[a-z]/gi

[]--checks for any characters between given inputs

a-z---covers the entire alphabet

g-----globally throughout the whole string

i-----getting upper and lowercase

Share:
991,645
Nike
Author by

Nike

Updated on July 08, 2022

Comments

  • Nike
    Nike 6 months

    How can I write a regex that matches only letters?

  • Philip Potter
    Philip Potter over 12 years
    not in all regex flavours. For example, vim regexes treat \p as "Printable character".
  • Joachim Sauer
    Joachim Sauer over 12 years
    That's a very ASCII-centric solution. This will break on pretty much any non-english text.
  • Philip Potter
    Philip Potter over 12 years
    this page suggests only java, .net, perl, jgsoft, XML and XPath regexes support \p{L}. But major omissions: python and ruby (though python has the regex module).
  • Gumbo
    Gumbo over 12 years
    @Joachim Sauer: It will rather break on languages using non-latin characters.
  • Nike
    Nike over 12 years
    I meant lettters. It doesn't appear to be working though. preg_match('/[a-zA-Z]+/', $name);
  • Ivo Wetzel
    Ivo Wetzel over 12 years
    Already breaks on 90% of German text, don't even mention French or Spanish. Italian might still do pretty well though.
  • Joachim Sauer
    Joachim Sauer over 12 years
    that depends on what definition of "latin character" you choose. J, U, Ö, Ä can all be argued to be latin characters or not, based on your definition. But they are all used in languages that use the "latin alphabet" for writing.
  • KristofMols
    KristofMols over 12 years
    [A-Za-z] is just the declaration of characters you can use. You still need to declare howmany times this declaration has to be used: [A-Za-z]{1,2} (to match 1 or 2 letters) or [A-Za-z]{1,*} (to match 1 or more letters)
  • Jörg W Mittag
    Jörg W Mittag over 12 years
    @Philip Potter: Ruby supports Unicode character properties using that exact same syntax.
  • Amal Murali
    Amal Murali over 8 years
    \w may not be a good solution in all cases. At least in PCRE, \w can match other characters as well. Quoting the PHP manual: "A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.".
  • OGHaza
    OGHaza over 8 years
    That is not what [^\W|\d] means
  • OGHaza
    OGHaza over 8 years
    [^\W|\d] means not \W and not | and not \d. It has the same net effect since | is part of \W but the | does not work as you think it does. Even then that means it accepts the _ character. You are probably looking for [^\W\d_]
  • Motlab
    Motlab over 8 years
    I agree with you, it accepts the _. But "NOT" | is equal than "AND", so [^\W|\d] means : NOT \W AND NOT \d
  • OGHaza
    OGHaza over 8 years
    [^ab] means not a and not b. [^a|b] means not a and not | and not b. To give a second example [a|b|c|d] is exactly the same as [abcd|||] which is exactly the same as [abcd|] - all of which equate to ([a]|[b]|[c]|[d]|[|]) the | is a literal character, not an OR operator. The OR operator is implied between each character in a character class, putting an actual | means you want the class to accept the | (pipe) character.
  • V-SHY
    V-SHY over 7 years
    words include other characters from letters
  • Nyerguds
    Nyerguds over 6 years
    Won't match any special characters though.
  • Eugen Konkov
    Eugen Konkov over 6 years
    \w means match letters and numbers
  • ZoFreX
    ZoFreX over 6 years
    I think this should be \p{L}\p{M}*+ to cover letters made up of multiple codepoints, e.g. a letter followed by accent marks. As per regular-expressions.info/unicode.html
  • phuclv
    phuclv over 6 years
    well à, á, ã, Ö, Ä... are letters too, so are অ, আ, ই, ঈ, Є, Ж, З, ﺡ, ﺥ, ﺩא, ב, ג, ש, ת, ... en.wikipedia.org/wiki/Letter_%28alphabet%29
  • Radu Simionescu
    Radu Simionescu about 6 years
    \p{L} matches all the umlauts sedilla accents etc, so you should go with that.
  • user1329482
    user1329482 about 5 years
    Works well in a selector engine for determining if the selector is just a tag name.
  • DaveMongoose
    DaveMongoose almost 5 years
    This will also match whitespace, symbols, etc. which does not seem to be what the question is asking for.
  • AER
    AER almost 5 years
    What do you do if you can't use [] because Python is too thick to understand nestings?
  • The Witness
    The Witness over 4 years
    And what about for instance, “Zażółć gęslą jaźń”?
  • karoluS
    karoluS over 4 years
    it doesn't include diacritic signs such as ŹŻŚĄ
  • matanster
    matanster over 3 years
    with python 3 this yields an error bad escape \p at position 0
  • Pablo over 3 years
    Instead of keep adding characters like adding äöüßÄÖÜ, you can go: ^[a-zA-Z]\p{L}+$ to include most of the western alphabets.
  • Catalina Chircu
    Catalina Chircu about 3 years
    @phuclv: Indeed, but that depends on the encoding, and the encoding is part of the settings of the program (either the default config or the one declared in a config file of the program). When I worked on different languages, I used to store that in a constant, in a config file.
  • phuclv
    phuclv about 3 years
    @CatalinaChircu encoding is absolutely irrelevant here. Encoding is a way to encode a code point in a character set in binary, for example UTF-8 is an encoding for Unicode. Letters OTOH depends on the language, and if one says [A-Za-z] are letters then the language that's being used must be specified
  • Catalina Chircu
    Catalina Chircu about 3 years
    @phuclv: Indeed, I should have mentioned the language, not the encoding. The language is important and finding the letters in English is not the same as finding the letters in Spanish or French. If you do not take into account the diacritics in these languages you can cut words in two.
  • Stefan Haustein
    Stefan Haustein about 3 years
  • Toto
    Toto almost 3 years
    You should have look at an ASCII table. A-z matches more than just letters, as well as À-ú
  • ndrwnaguib
    ndrwnaguib over 2 years
    Hello @jarraga. Welcome to SO, did you read how to answer a question?. It should assist the clearance of your answer, and hence avoid down voting.
  • Toto
    Toto over 2 years
    What about non Latin letter? For example çéàñ. Your regex is less readable than \p{L}
  • Frederic
    Frederic about 2 years
    Clever answer. Works perfectly for accented letters as well.
  • jave.web
    jave.web almost 2 years
    For letters beyond english: /\p{Letter}/gu ref: developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/…
  • jave.web
    jave.web almost 2 years
    JavaScript needs u after regex to detect the unicode group: /\p{Letter}/gu
  • dimitar.bogdanov
    dimitar.bogdanov over 1 year
    ^ or any Cyrillic letters
  • Eric Soyke
    Eric Soyke about 1 year
    For a long time I had been using [A-z]+ but just noticed this allows a few special characters like ` and [ to slip in. [a-zA-Z]+ is indeed the way to go.