How to convert UTF8 string to UTF16

44,145

The bytes from the server are not UTF-8 if they look like S\0a\0m\0p\0l\0e. They are UTF-16. You can convert UTF16 bytes to a Java String with:

byte[] bytes = ...
String string = new String(bytes, "UTF-16");

Or you can use UTF-16LE or UTF-16BE as the character set name if you know the endian-ness of the byte stream coming from the server.

If you've already (mistakenly) constructed a String from the bytes as if it were UTF-8, you can convert to UTF-16 with:

string = new String(string.getBytes("UTF-8"), "UTF-16");

However, as JB Nizet points out, this round trip (bytes -> UTF-8 string -> bytes) is potentially lossy if the bytes weren't valid UTF-8 to start with.

Share:
44,145
dinesh707
Author by

dinesh707

Updated on July 09, 2022

Comments

  • dinesh707
    dinesh707 almost 2 years

    I'm getting a UTF8 string by processing a request sent by a client application. But the string is really UTF16. What can I do to get it into my local string is a letter followed by \0 character? I need to convert that String into UTF16.

    Sample received string: S\0a\0m\0p\0l\0e (UTF8).
    What I want is : Sample (UTF16)

    FileItem item = (FileItem) iter.next();
    String field = "";
    String value = "";
    if (item.isFormField()) {
      try{
        value=item.getString();
        System.out.println("====" + value);
      }
    
  • JB Nizet
    JB Nizet over 11 years
    I would say that if he has already constructed a String from the bytes as if it were UTF-8, then there is a bug, and this shouldn't have been done. Every sequence of bytes is not valid UTF-8, and trying to transform random bytes (or UTF-16 bytes) into an UTF8 String is a potentially lossy process.
  • skomisa
    skomisa over 4 years
    The question was about how to do the conversion in Java.