java string.getBytes("UTF-8") javascript equivalent

41,430

Solution 1

JavaScript has no concept of character encoding for String, everything is in UTF-16. Most of time time the value of a char in UTF-16 matches UTF-8, so you can forget it's any different.

There are more optimal ways to do this but

function s(x) {return x.charCodeAt(0);}
"test.message".split('').map(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]

So what is unescape(encodeURIComponent(str)) doing? Let's look at each individually,

  1. encodeURIComponent is converting every character in str which is illegal or has a meaning in URI Syntax into a URI escaped version so that there is no problem using it as a key or value in the search component of a URI, for example encodeURIComponent('&='); // "%26%3D" Notice how this is now a 6 character long String.
  2. unescape is actually depreciated, but it does a similar job to decodeURI or decodeURIComponent (the reverse of encodeURIComponent). If we look in the ES5 spec we can see 11. Let c be the character whose code unit value is the integer represented by the four hexadecimal digits at positions k+2, k+3, k+4, and k+5 within Result(1).
    So, 4 digits is 2 bytes is "UTF-8", however as I mentioned, all Strings are UTF-16, so it's really a UTF-16 string limiting itself to UTF-8.

Solution 2

You can use TextEncoder which is part of the Encoding Living Standard. According to the Encoding API entry from the Chromium Dashboard, it shipped in Firefox and will ship in Chrome 38. There is also a text-encoding polyfill available.

The JavaScript code sample below returns a Uint8Array filled with the values you expect.

var s = "test.message";
var encoder = new TextEncoder();
encoder.encode(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]
Share:
41,430
Admin
Author by

Admin

Updated on September 01, 2020

Comments

  • Admin
    Admin over 3 years

    I have this string in java:

    "test.message"
    
    byte[] bytes = plaintext.getBytes("UTF-8");
    //result: [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]
    

    If I do the same thing in javascript:

        stringToByteArray: function (str) {         
            str = unescape(encodeURIComponent(str));
    
            var bytes = new Array(str.length);
            for (var i = 0; i < str.length; ++i)
                bytes[i] = str.charCodeAt(i);
    
            return bytes;
        },
    

    I get:

     [7,163,140,72,178,72,244,241,149,43,67,124]
    

    I was under the impression that the unescape(encodeURIComponent()) would correctly translate the string to UTF-8. Is this not the case?

    Reference:

    http://ecmanaut.blogspot.be/2006/07/encoding-decoding-utf8-in-javascript.html

  • Admin
    Admin about 10 years
    I cannot forget it's any different as I need support for chinese.
  • Admin
    Admin about 10 years
    btw if you read this they suggest unescape(encodeUricomponent()) to get utf8 value from utf16: ecmanaut.blogspot.be/2006/07/…
  • Admin
    Admin about 10 years
    So, is there a solution?
  • Paul S.
    Paul S. about 10 years
    @Wesley I should have actually tested your code; I can't actually reproduce the "wrong" result you go, I get the same as you expected, and when I try to reverse your weird output I get "£H²Hôñ+C|"
  • Paul S.
    Paul S. about 10 years
    Are you serving the page as UTF-8? I'm starting to think maybe you're serving the page in a different character encoding which doesn't support all your characters and then want to convert the malformed strings in that into UTF-8. (This will be exceedingly difficult as the browser does a Stream -> String (in Stream's encoding) -> UTF-16 conversion before JavaScript sees it.
  • Admin
    Admin about 10 years
    Thanks, that was it. Headers were being overwritten.
  • PixnBits
    PixnBits over 9 years
    Incorrect, JavaScript spec uses UCS-2 which is similar to UTF-16 but does not behave the same all the time. See mathiasbynens.be/notes/javascript-encoding and mathiasbynens.be/notes/javascript-unicode for excellent discourses on the matter
  • Neil Gaetano Lindberg
    Neil Gaetano Lindberg almost 3 years
    And, then to get the total bytes, like Java's .getBytes()? Add values in array? i.e. Array.from(new TextEncoder().encode('some delicious cookie')).reduce((acc, current) => acc + current, 0)
  • dcow
    dcow over 2 years
    This answer is from 2014 and should be updated to note that a polyfill is no longer needed and the api is supported on all current browsers: developer.mozilla.org/en-US/docs/Web/API/TextEncoder