How to convert string with html encoding to Unicode in java
Solution 1
In Java, for a unicode string literal, you do \u
before the number.
For example:
System.out.println("\u0042");
System.out.println("\u00AF\\_(\u30C4)_/\u00AF");
Prints:
B
¯\_(ツ)_/¯
What you want is:
System.out.println("\u00D0\u1ED9t nhi\u00EAn, \u1EDF g\u1ED1c T\u00E2y B\u1EAFc v\u0103ng v\u1EB3ng c\u00F3 ti\u1EBFng v\u00F3 ng\u1EF1a d\u1ED3n d\u1EADp.\n");
Prints:
Ðột nhiên, ở gốc Tây Bắc văng vẳng có tiếng vó ngựa dồn dập.
EDIT: Apache commons is the best way to go:
StringEscapeUtils.unescapeHtml4();
.
Solution 2
Use Apache Commons StringEscapeUtils.unescapeHtml(string)
for this.
Refer: Java: How to unescape HTML character entities in Java?
ThaiPD
Updated on June 04, 2022Comments
-
ThaiPD almost 2 years
enter code here
I have a problem with html encoding. I have a string with html encoding like below :Ðột nhiên, ở gốc Tây Bắc văng vẳng có tiếng vó ngựa dồn dập.
I want to convert this String to Unicode. Its output (actual value) should be
Ðột nhiên, ở gốc Tây Bắc văng vẳng có tiếng vó ngựa dồn dập.
I tried to find out the solution as this sugest but it just can help for string with all character has format begin with
&#
. with characters begin by&xxxx
, by this page I got its encode is html encoding but my input string is the combine of convert HTML Entity (named) and HTML Entity (decimal).Can anyone please give me a suggestion ? It's the best if you can solve it without any additional library in java.
thanks in advance!
[UPDATE] I solved my problem by using Apache library :
String encodeString = "Ðột nhiên, ở gốc Tây Bắc văng vẳng có tiếng vó ngựa dồn dập."; String unEncodeString = StringEscapeUtils.unescapeHtml4(encodeString); System.out.println("OUTPUT : " + unEncodeString);
=====>
OUTPUT : Ðột nhiên, ở gốc Tây Bắc văng vẳng có tiếng vó ngựa dồn dập.
-
ThaiPD over 9 yearsThank you for your answer but I mean how can I convert string "Ðột" to "Đột" string. I have existing input and I want to get output as above. Could you please help more ?
-
ThaiPD over 9 yearsis there any way with out Apache library? I want to fix it with out add-on library...