How do I get the unicode/hex representation of a symbol out of the HTML using JavaScript/jQuery?

28,230

Solution 1

Using mostly plain JavaScript, you should be able to do:

function entityForSymbolInContainer(selector) {
    var code = $(selector).text().charCodeAt(0);
    var codeHex = code.toString(16).toUpperCase();
    while (codeHex.length < 4) {
        codeHex = "0" + codeHex;
    }

    return "&#x" + codeHex + ";";
}

Here's an example: http://jsfiddle.net/btWur/

Solution 2

charCodeAt will get you the decimal value of the string:

"α".charCodeAt(0); //returns 945
0x03b1 === 945; //returns true

toString will then get the hex string

(945).toString(16); // returns "3b1"

(Confirmed to work in IE9 and Chrome)

Solution 3

If you would try to convert Unicode character out of BMP (basic multilingual plane) in ways above - you are up for a nasty surprise. Characters out of BMP are encoded as multiple UTF16 values for example:

"🔒".length = 2 (one part for shackle one part for lock base :) )

so "🔒".charCodeAt(0) will give you 55357 which is only 'half' of number while "🔒".charCodeAt(1) will give you 56594 which is the other half.

To get char codes for those values you might wanna use use following string extension function

String.prototype.charCodeUTF32 = function(){   
    return ((((this.charCodeAt(0)-0xD800)*0x400) + (this.charCodeAt(1)-0xDC00) + 0x10000));
};

you can also use it like this

"&#x"+("🔒".charCodeUTF32()).toString(16)+";"

to get html hex codes.

Hope this saves you some time.

Share:
28,230

Related videos on Youtube

Hristo
Author by

Hristo

LinkedIn JustBeamIt

Updated on July 09, 2022

Comments

  • Hristo
    Hristo almost 2 years

    Say I have an element like this...

    <math xmlns="http://www.w3.org/1998/Math/MathML">
      <mo class="symbol">α</mo>
    </math>
    

    Is there a way to get the unicode/hex value of alpha α, &#x03B1, using JavaScript/jQuery? Something like...

    $('.symbol').text().unicode(); // I know unicode() doesn't exist
    $('.symbol').text().hex(); // I know hex() doesn't exist
    

    I need &#x03B1 instead of α and it seems like anytime I insert &#x03B1 into the DOM and try to retrieve it right away, it gets rendered and I can't get &#x03B1 back; I just get α.

  • Hristo
    Hristo almost 13 years
    @aroth... this looks awesome! i'm testing now
  • L0j1k
    L0j1k almost 8 years
    +1 Thanks for saving us from this landmine! Checking the length of the character was the key for me.
  • kontur
    kontur about 3 years
    Good insight, and note that not just emojis are beyond the BMP :) Your prototype enhancement should probably check the length first; for "UTF-8" strings the this.charCodeAt(1) with return NaN, and so will the entire function as a consequence; for "length === 2" chars it should just return charCodeAt(0) as such.