Detecting if a character is a letter
Solution 1
You could use a regular expression. Unfortunately, JavaScript does not consider international characters to be "word characters". But you can do it with the regular expression below:
var firstLetter = name.charAt(0);
firstLetter = firstLetter.toUpperCase();
if (!firstLetter.match(/^\wÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöüçÇßØøÅåÆæÞþÐð$/)) {
firstLetter = "0";
}
if (words[firstLetter] === undefined) {
words[firstLetter] = [];
}
words[firstLetter].push(name);
Solution 2
You can use this to test if a character is likely to be a letter:
var firstLetter = name.charAt(0).toUpperCase();
if( firstLetter.toLowerCase() != firstLetter) {
// it's a letter
}
else {
// it's a symbol
}
This works because JavaScript already has a mapping for lowercase to uppercase letters (and vice versa), so if a character is unchanged by toLowerCase()
then it's not in the letter table.
Solution 3
Try converting the character to its uppercase and lowercase and check to see if there's a difference. Only letter characters change when they are converted to their respective upper and lower case (numbers, punctuation marks, etc. don't). Below is a sample function using this concept in mind:
function isALetter(charVal)
{
if( charVal.toUpperCase() != charVal.toLowerCase() )
return true;
else
return false;
}
Solution 4
You can use .charCodeAt(0);
to get the position in the ASCII Chart and then do some checks.
The ranges you are looking for are probably 65-90, 97-122, 128-154, 160-165 (inclusive), but double check this by viewing the ASCII Chart
Something like this
if((x>64&&x<91)||(x>96&&x<123)||(x>127&&x<155)||(x>159&&x<166))
Where x
is the Char Code
Solution 5
This is fortunately now possible without external libraries. Straight from the docs:
let story = "It’s the Cheshire Cat: now I shall have somebody to talk to.";
// Most explicit form
story.match(/\p{General_Category=Letter}/gu);
// It is not mandatory to use the property name for General categories
story.match(/\p{Letter}/gu);
cdarwin
Updated on July 08, 2022Comments
-
cdarwin over 1 year
Given a set of words, I need to put them in an hash keyed on the first letter of the word. I have words = {}, with keys A..Z and 0 for numbers and symbols. I was doing something like
var firstLetter = name.charAt(0); firstLetter = firstLetter.toUpperCase(); if (firstLetter < "A" || firstLetter > "Z") { firstLetter = "0"; } if (words[firstLetter] === undefined) { words[firstLetter] = []; } words[firstLetter].push(name);
but this fails with dieresis and other chars, like in the word Ärzteversorgung. That word is put in the "0" array, how could I put it in the "A" array?
-
ajax333221 almost 12 yearsI don't know, charCodeAt MDN says "Unicode code points range from 0 to 1,114,111"
-
Jukka K. Korpela almost 12 yearsAn interesting trick, but as you emphasize, it works just “likely”. But it might be used if you add ad hoc checks for all the characters that may appear and that cause a wrong result in the simple test. In the Latin 1 range, the following characters get misclassified: º and ª (masculine and feminine ordinal indicators), sharp ß (probably the most relevant character here), and debatably the micro sign µ (formally a letter in Unicode, compatibility equivalent to Greek letter mu, but widely understood as a special character rather than a letter).
-
Jukka K. Korpela over 11 yearsIt only works for characters in bicameral scripts, i.e. writing systems that make uppercase/lowercase distinction; most scripts don’t (e.g., Hebrew, Devanagari, Chinese).
-
Jérôme Verstrynge almost 11 years@jnrbsn No it won't... ASCII does not cover for all of what is considered a letter in every language
-
Jérôme Verstrynge almost 11 years@ajax333221 There are some valid code points which are not printable characters
-
vsync about 10 years@JukkaK.Korpela - Yes there are cons and the pros are the speed of the check. this can processed much faster than anything else, and English will likely (should) be the only language most people will need)
-
Jukka K. Korpela about 10 years@vsync, the question mentions the sample word “Ärzteversorgung”. It isn’t English. Typically if people only think of English, they don’t even ask this question—they just assume that
[A-Za-z]
covers all letters. -
vsync about 10 yearsQuestions in Stackoverflow sometimes means nothing because I came here from Google and what the OP asked for is not what the title suggests, therefore it's important to cover the answers for the people who do come here from Google for answers.
-
Onno van der Zee almost 4 yearsAs of 2020: In a latin script you can use the uppercase/lowercase comparison to mark a character as a letter. Try this: 'ß'.toUpperCase() // ==> 'SS', 'ſ'.toUpperCase() // ==> 'S'. Even in Greek: 'µ'.toUpperCase() // ==> 'Μ' (for \u00B5 as well as \u03BC). Only for the ordinal indicators the lowercase === uppercase.
-
Javi Marzán over 3 yearsThis does not work. The expression
"Á" == "á"
will always returnfalse
!