Explanation of JAXB error: Invalid byte 1 of 1-byte UTF-8 sequence

18,517

Solution 1

So, you problem is that JAXB treats XML files without <?xml ...?> header as UTF-8, when your file uses some other encoding (probably ISO-8859-1 or Windows-1252, if 0xBF character actually intended to mean ¿).

If you can change the producer of the file, you may add <?xml ...?> header with actual encoding specification, or just use UTF-8 to write a file.

If you can't change the producer, you have to use InputStreamReader with explicit encoding specification, because (unfortunately) JAXB don't allow to change its default encoding:

results = (Results) unmarshaller.unmarshal(
   new InputStreamReader(new FileInputStream(inputFile), "ISO-8859-1")); 

However, this solution is fragile - it fails on input files with <?xml ...?> header with different encoding specification.

Solution 2

That's probably a Byte Order Mark (BOM), and is a special byte sequence at the start of a UTF file. They are, frankly, a pain in the arse, and seem particularly common when interacting with .net systems.

Try rephrasing your code to use a Reader rather than an InputStream:

results = (Results) unmarshaller.unmarshal(new FileReader(inputFile));

A Reader is UTF-aware, and might make a better stab at it. More simply, pass the File directly to the Unmarshaller, and let the JAXBContext worry about it:

results = (Results) unmarshaller.unmarshal(inputFile);
Share:
18,517
Marcus Leon
Author by

Marcus Leon

Director Clearing Technology, Intercontinental Exchange. Develop the clearing systems that power ICE/NYSE's derivatives markets.

Updated on June 23, 2022

Comments

  • Marcus Leon
    Marcus Leon over 1 year

    We're parsing an XML document using JAXB and get this error:

    [org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
    

    What exactly does this mean and how can we resolve this??

    We are executing the code as:

    jaxbContext = JAXBContext.newInstance(Results.class);
    Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
    unmarshaller.setSchema(getSchema());
    results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));
    

    Update

    Issue appears to be due to this "funny" character in the XML file: ¿

    Why would this cause such a problem??

    Update 2

    There are two of those weird characters in the file. They are around the middle of the file. Note that the file is created based on data in a database and those weird characters somehow got into the database.

    Update 3

    Here is the full XML snippet:

    <Description><![CDATA[Mt. Belvieu ¿ Texas]]></Description>
    

    Update 4

    Note that there is no <?xml ...?> header.

    The HEX for the special character is BF