Detecting if a character is a letter

30,591

Solution 1

You could use a regular expression. Unfortunately, JavaScript does not consider international characters to be "word characters". But you can do it with the regular expression below:

var firstLetter = name.charAt(0);
firstLetter = firstLetter.toUpperCase();
if (!firstLetter.match(/^\wÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöüçÇßØøÅåÆæÞþÐð$/)) {
    firstLetter = "0";
}
if (words[firstLetter] === undefined) {
    words[firstLetter] = [];
} 
words[firstLetter].push(name);

Solution 2

You can use this to test if a character is likely to be a letter:

var firstLetter = name.charAt(0).toUpperCase();
if( firstLetter.toLowerCase() != firstLetter) {
    // it's a letter
}
else {
    // it's a symbol
}

This works because JavaScript already has a mapping for lowercase to uppercase letters (and vice versa), so if a character is unchanged by toLowerCase() then it's not in the letter table.

Solution 3

Try converting the character to its uppercase and lowercase and check to see if there's a difference. Only letter characters change when they are converted to their respective upper and lower case (numbers, punctuation marks, etc. don't). Below is a sample function using this concept in mind:

function isALetter(charVal)
{
    if( charVal.toUpperCase() != charVal.toLowerCase() )
       return true;
    else
       return false;
}

Solution 4

You can use .charCodeAt(0); to get the position in the ASCII Chart and then do some checks.

The ranges you are looking for are probably 65-90, 97-122, 128-154, 160-165 (inclusive), but double check this by viewing the ASCII Chart

Something like this

if((x>64&&x<91)||(x>96&&x<123)||(x>127&&x<155)||(x>159&&x<166))

Where x is the Char Code

Solution 5

This is fortunately now possible without external libraries. Straight from the docs:

let story = "It’s the Cheshire Cat: now I shall have somebody to talk to.";

// Most explicit form
story.match(/\p{General_Category=Letter}/gu);

// It is not mandatory to use the property name for General categories
story.match(/\p{Letter}/gu);
Share:
30,591
cdarwin
Author by

cdarwin

Updated on July 08, 2022

Comments

  • cdarwin
    cdarwin over 1 year

    Given a set of words, I need to put them in an hash keyed on the first letter of the word. I have words = {}, with keys A..Z and 0 for numbers and symbols. I was doing something like

    var firstLetter = name.charAt(0);
        firstLetter = firstLetter.toUpperCase();
    
    if (firstLetter < "A" || firstLetter > "Z") {
        firstLetter = "0";
    }
    if (words[firstLetter] === undefined) {
        words[firstLetter] = [];
    } 
    words[firstLetter].push(name);
    

    but this fails with dieresis and other chars, like in the word Ärzteversorgung. That word is put in the "0" array, how could I put it in the "A" array?

  • ajax333221
    ajax333221 almost 12 years
    I don't know, charCodeAt MDN says "Unicode code points range from 0 to 1,114,111"
  • Jukka K. Korpela
    Jukka K. Korpela almost 12 years
    An interesting trick, but as you emphasize, it works just “likely”. But it might be used if you add ad hoc checks for all the characters that may appear and that cause a wrong result in the simple test. In the Latin 1 range, the following characters get misclassified: º and ª (masculine and feminine ordinal indicators), sharp ß (probably the most relevant character here), and debatably the micro sign µ (formally a letter in Unicode, compatibility equivalent to Greek letter mu, but widely understood as a special character rather than a letter).
  • Jukka K. Korpela
    Jukka K. Korpela over 11 years
    It only works for characters in bicameral scripts, i.e. writing systems that make uppercase/lowercase distinction; most scripts don’t (e.g., Hebrew, Devanagari, Chinese).
  • Jérôme Verstrynge
    Jérôme Verstrynge almost 11 years
    @jnrbsn No it won't... ASCII does not cover for all of what is considered a letter in every language
  • Jérôme Verstrynge
    Jérôme Verstrynge almost 11 years
    @ajax333221 There are some valid code points which are not printable characters
  • vsync
    vsync about 10 years
    @JukkaK.Korpela - Yes there are cons and the pros are the speed of the check. this can processed much faster than anything else, and English will likely (should) be the only language most people will need)
  • Jukka K. Korpela
    Jukka K. Korpela about 10 years
    @vsync, the question mentions the sample word “Ärzteversorgung”. It isn’t English. Typically if people only think of English, they don’t even ask this question—they just assume that [A-Za-z] covers all letters.
  • vsync
    vsync about 10 years
    Questions in Stackoverflow sometimes means nothing because I came here from Google and what the OP asked for is not what the title suggests, therefore it's important to cover the answers for the people who do come here from Google for answers.
  • Onno van der Zee
    Onno van der Zee almost 4 years
    As of 2020: In a latin script you can use the uppercase/lowercase comparison to mark a character as a letter. Try this: 'ß'.toUpperCase() // ==> 'SS', 'ſ'.toUpperCase() // ==> 'S'. Even in Greek: 'µ'.toUpperCase() // ==> 'Μ' (for \u00B5 as well as \u03BC). Only for the ordinal indicators the lowercase === uppercase.
  • Javi Marzán
    Javi Marzán over 3 years
    This does not work. The expression "Á" == "á" will always return false!