Regex to match only letters

regex

991,645

Solution 1

Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^[a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).

If you want to match other letters than A–Z, you can either add them to the character set: [a-zA-ZäöüßÄÖÜ]. Or you use predefined character classes like the Unicode character property class \p{L} that describes the Unicode characters that are letters.

Solution 2

\p{L} matches anything that is a Unicode letter if you're interested in alphabets beyond the Latin one

Solution 3

Depending on your meaning of "character":

[A-Za-z] - all letters (uppercase and lowercase)

[^0-9] - all non-digit characters

Solution 4

The closest option available is

[\u\l]+

which matches a sequence of uppercase and lowercase letters. However, it is not supported by all editors/languages, so it is probably safer to use

[a-zA-Z]+

as other users suggest

Solution 5

You would use

/[a-z]/gi

[]--checks for any characters between given inputs

a-z---covers the entire alphabet

g-----globally throughout the whole string

i-----getting upper and lowercase

View more solutions

991,645

Author by

Nike

Updated on July 08, 2022

Comments

Nike 6 months

How can I write a regex that matches only letters?
Philip Potter over 12 years

not in all regex flavours. For example, vim regexes treat \p as "Printable character".
Joachim Sauer over 12 years

That's a very ASCII-centric solution. This will break on pretty much any non-english text.
Philip Potter over 12 years

this page suggests only java, .net, perl, jgsoft, XML and XPath regexes support \p{L}. But major omissions: python and ruby (though python has the regex module).
Gumbo over 12 years

@Joachim Sauer: It will rather break on languages using non-latin characters.
Nike over 12 years

I meant lettters. It doesn't appear to be working though. preg_match('/[a-zA-Z]+/', $name);
Ivo Wetzel over 12 years

Already breaks on 90% of German text, don't even mention French or Spanish. Italian might still do pretty well though.
Joachim Sauer over 12 years

that depends on what definition of "latin character" you choose. J, U, Ö, Ä can all be argued to be latin characters or not, based on your definition. But they are all used in languages that use the "latin alphabet" for writing.
KristofMols over 12 years

[A-Za-z] is just the declaration of characters you can use. You still need to declare howmany times this declaration has to be used: [A-Za-z]{1,2} (to match 1 or 2 letters) or [A-Za-z]{1,*} (to match 1 or more letters)
Jörg W Mittag over 12 years

@Philip Potter: Ruby supports Unicode character properties using that exact same syntax.
Amal Murali over 8 years

\w may not be a good solution in all cases. At least in PCRE, \w can match other characters as well. Quoting the PHP manual: "A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.".
OGHaza over 8 years

That is not what [^\W|\d] means
OGHaza over 8 years

[^\W|\d] means not \W and not | and not \d. It has the same net effect since | is part of \W but the | does not work as you think it does. Even then that means it accepts the _ character. You are probably looking for [^\W\d_]
Motlab over 8 years

I agree with you, it accepts the _. But "NOT" | is equal than "AND", so [^\W|\d] means : NOT \W AND NOT \d
OGHaza over 8 years

[^ab] means not a and not b. [^a|b] means not a and not | and not b. To give a second example [a|b|c|d] is exactly the same as [abcd|||] which is exactly the same as [abcd|] - all of which equate to ([a]|[b]|[c]|[d]|[|]) the | is a literal character, not an OR operator. The OR operator is implied between each character in a character class, putting an actual | means you want the class to accept the | (pipe) character.
V-SHY over 7 years

words include other characters from letters
Nyerguds over 6 years

Won't match any special characters though.
Eugen Konkov over 6 years

\w means match letters and numbers
ZoFreX over 6 years

I think this should be \p{L}\p{M}*+ to cover letters made up of multiple codepoints, e.g. a letter followed by accent marks. As per regular-expressions.info/unicode.html
phuclv over 6 years

well à, á, ã, Ö, Ä... are letters too, so are অ, আ, ই, ঈ, Є, Ж, З, ﺡ, ﺥ, ﺩא, ב, ג, ש, ת, ... en.wikipedia.org/wiki/Letter_%28alphabet%29
Radu Simionescu about 6 years

\p{L} matches all the umlauts sedilla accents etc, so you should go with that.
user1329482 about 5 years

Works well in a selector engine for determining if the selector is just a tag name.
DaveMongoose almost 5 years

This will also match whitespace, symbols, etc. which does not seem to be what the question is asking for.
AER almost 5 years

What do you do if you can't use [] because Python is too thick to understand nestings?
The Witness over 4 years

And what about for instance, “Zażółć gęslą jaźń”?
karoluS over 4 years

it doesn't include diacritic signs such as ŹŻŚĄ
matanster over 3 years

with python 3 this yields an error bad escape \p at position 0
Pablo over 3 years

Instead of keep adding characters like adding äöüßÄÖÜ, you can go: ^[a-zA-Z]\p{L}+$ to include most of the western alphabets.
Catalina Chircu about 3 years

@phuclv: Indeed, but that depends on the encoding, and the encoding is part of the settings of the program (either the default config or the one declared in a config file of the program). When I worked on different languages, I used to store that in a constant, in a config file.
phuclv about 3 years

@CatalinaChircu encoding is absolutely irrelevant here. Encoding is a way to encode a code point in a character set in binary, for example UTF-8 is an encoding for Unicode. Letters OTOH depends on the language, and if one says [A-Za-z] are letters then the language that's being used must be specified
Catalina Chircu about 3 years

@phuclv: Indeed, I should have mentioned the language, not the encoding. The language is important and finding the letters in English is not the same as finding the letters in Spanish or French. If you do not take into account the diacritics in these languages you can cut words in two.
Stefan Haustein about 3 years

Doesn't work in firefox: bugzilla.mozilla.org/show_bug.cgi?id=1361876
Toto almost 3 years

You should have look at an ASCII table. A-z matches more than just letters, as well as À-ú
ndrwnaguib over 2 years

Hello @jarraga. Welcome to SO, did you read how to answer a question?. It should assist the clearance of your answer, and hence avoid down voting.
Toto over 2 years

What about non Latin letter? For example çéàñ. Your regex is less readable than \p{L}
Frederic about 2 years

Clever answer. Works perfectly for accented letters as well.
jave.web almost 2 years

For letters beyond english: /\p{Letter}/gu ref: developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/…
jave.web almost 2 years

JavaScript needs u after regex to detect the unicode group: /\p{Letter}/gu
dimitar.bogdanov over 1 year

^ or any Cyrillic letters
Eric Soyke about 1 year

For a long time I had been using [A-z]+ but just noticed this allows a few special characters like ` and [ to slip in. [a-zA-Z]+ is indeed the way to go.