java string.getBytes("UTF-8") javascript equivalent
Solution 1
JavaScript has no concept of character encoding for String, everything is in UTF-16. Most of time time the value of a char
in UTF-16 matches UTF-8, so you can forget it's any different.
There are more optimal ways to do this but
function s(x) {return x.charCodeAt(0);}
"test.message".split('').map(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]
So what is unescape(encodeURIComponent(str))
doing? Let's look at each individually,
-
encodeURIComponent
is converting every character instr
which is illegal or has a meaning in URI Syntax into a URI escaped version so that there is no problem using it as a key or value in the search component of a URI, for exampleencodeURIComponent('&='); // "%26%3D"
Notice how this is now a 6 character long String. -
unescape
is actually depreciated, but it does a similar job todecodeURI
ordecodeURIComponent
(the reverse ofencodeURIComponent
). If we look in the ES5 spec we can see11. Let c be the character whose code unit value is the integer represented by the four hexadecimal digits at positions k+2, k+3, k+4, and k+5 within Result(1).
So,4
digits is2
bytes is "UTF-8", however as I mentioned, all Strings are UTF-16, so it's really a UTF-16 string limiting itself to UTF-8.
Solution 2
You can use TextEncoder
which is part of the Encoding Living Standard. According to the Encoding API entry from the Chromium Dashboard, it shipped in Firefox and will ship in Chrome 38. There is also a text-encoding polyfill available.
The JavaScript code sample below returns a Uint8Array
filled with the values you expect.
var s = "test.message";
var encoder = new TextEncoder();
encoder.encode(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]
Admin
Updated on September 01, 2020Comments
-
Admin over 3 years
I have this string in java:
"test.message" byte[] bytes = plaintext.getBytes("UTF-8"); //result: [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]
If I do the same thing in javascript:
stringToByteArray: function (str) { str = unescape(encodeURIComponent(str)); var bytes = new Array(str.length); for (var i = 0; i < str.length; ++i) bytes[i] = str.charCodeAt(i); return bytes; },
I get:
[7,163,140,72,178,72,244,241,149,43,67,124]
I was under the impression that the unescape(encodeURIComponent()) would correctly translate the string to UTF-8. Is this not the case?
Reference:
http://ecmanaut.blogspot.be/2006/07/encoding-decoding-utf8-in-javascript.html
-
Admin about 10 yearsI cannot forget it's any different as I need support for chinese.
-
Admin about 10 yearsbtw if you read this they suggest unescape(encodeUricomponent()) to get utf8 value from utf16: ecmanaut.blogspot.be/2006/07/…
-
Admin about 10 yearsSo, is there a solution?
-
Paul S. about 10 years@Wesley I should have actually tested your code; I can't actually reproduce the "wrong" result you go, I get the same as you expected, and when I try to reverse your weird output I get
"£H²Hôñ+C|"
-
Paul S. about 10 yearsAre you serving the page as UTF-8? I'm starting to think maybe you're serving the page in a different character encoding which doesn't support all your characters and then want to convert the malformed strings in that into UTF-8. (This will be exceedingly difficult as the browser does a Stream -> String (in Stream's encoding) -> UTF-16 conversion before JavaScript sees it.
-
Admin about 10 yearsThanks, that was it. Headers were being overwritten.
-
PixnBits over 9 yearsIncorrect, JavaScript spec uses UCS-2 which is similar to UTF-16 but does not behave the same all the time. See mathiasbynens.be/notes/javascript-encoding and mathiasbynens.be/notes/javascript-unicode for excellent discourses on the matter
-
Neil Gaetano Lindberg almost 3 yearsAnd, then to get the total bytes, like Java's
.getBytes()
? Add values in array? i.e.Array.from(new TextEncoder().encode('some delicious cookie')).reduce((acc, current) => acc + current, 0)
-
dcow over 2 yearsThis answer is from 2014 and should be updated to note that a polyfill is no longer needed and the api is supported on all current browsers: developer.mozilla.org/en-US/docs/Web/API/TextEncoder