UTF-8 characters with JAXB in Java 8

16,150

The problem is that in your source code the µ is encoded as \265. Which is not valid for UTF-8. As UTF-8 encoding it is \uC2B5.

In this source the character encoding for the file is ISO8859.

class Latin1 {
    public static void main(String[] args) {
        String s = "µ"; // \265
        System.out.println(s);
    }
}

Which can be compiled with ...

javac -encoding iso8859-1 Scratch.java

... but it fails with UTF-8 encoding

javac -encoding UTF-8 Latin1.java
Latin1.java:3: error: unmappable character for encoding UTF-8
        String s = "?";
                    ^

In this source the character encoding for the file is UTF-8.

class Utf8 {
    public static void main(String[] args) {
        String s = "µ"; // \uC2B5
        System.out.println(s);
    }
}

Which can be compiled with ISO8859-1 as well with UTF-8.

javac -encoding UTF-8 Utf8.java
javac -encoding iso8859-1 Utf8.java

edit In case copy and past from the webpage would alter the encoding. Both source files can be created as below, which should make the difference visible.

String latin1 = "class Latin1 {\n"
        + " public static void main(String[] args) {\n"
        + "        String s = \"µ\";\n"
        + "        System.out.println(s);\n"
        + " }\n"
        + "}";
Files.write(Paths.get("Latin1.java"), 
        latin1.getBytes(StandardCharsets.ISO_8859_1));

String utf8 = "class Utf8 {\n"
        + " public static void main(String[] args) {\n"
        + "        String s = \"µ\";\n"
        + "        System.out.println(s);\n"
        + " }\n"
        + "}";
Files.write(Paths.get("Utf8.java"), 
        utf8 .getBytes(StandardCharsets.UTF_8));
}
Share:
16,150
kirsty
Author by

kirsty

Updated on June 13, 2022

Comments

  • kirsty
    kirsty almost 2 years

    I recently migrated an application for JBoss AS 5 to Wildfly 8, and as such had to move from Java 6 to Java 8.

    I'm now encountering a problem when running one of my unit tests through Ant:

    [javac] C:\Users\test\JAXBClassTest.java:123: error: unmappable character for encoding UTF8
    

    Line 123 of the test class is:

    Assert.assertEquals("Jµhn", JAXBClass.getValue()); 
    

    This test is in place specifically to ensure that the JAXB marshaller can handle UTF-8 characters, which I believe µ is. I have added a property onto the JAXB marshaller to ensure that these characters are allowed:

    marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
    

    I've seen multiple questions (1, 2, 3) on Stack Overflow which seem to be similar but their answers wither explain why invalid characters which were previously decoded one way are now decoded in another or don't appear to actually have the same issue as me.

    If all the characters are valid should this cause an issue? I know I must be missing something but I can't see what.