Java XML Unmarshalling fails on ampersand (&) using JAXB

13,396

Solution 1

It turns out that the problem is because of the framework I'm using (Mentawai framework). The said XML comes from the POST body of an HTTP request.

Apparently, the framework converts the character entities in the XML body, therefore, & becomes & and the unmarshaller fails to unmarshal the XML.

Solution 2

Xerces converts & to & and then tries to resolve &Address which fails because it does not end with ;. Put a space between & and Address and it should work. Putting a space will not work as Xerces will now try to resolve & and throw the second error given in OP. You can wrap the test in a CDATA section and Xerces will not try to resolve the entities.

Solution 3

I've run into this too. First pass I simply replaced the &amp to a token string (AMPERSAND_TOKEN), sent it through JAXB, then re-replaced the ampersand. Not ideal, but it was a quick fix.

Second pass I made a lot of significant changes, so I'm not sure what exactly solved the problem. I suspect that providing JAXB access to the html dtds made it much happier, but that's only a guess and could be specific to my project.

HTH

Share:
13,396
ryanprayogo
Author by

ryanprayogo

A software developer in Toronto, Canada

Updated on June 04, 2022

Comments

  • ryanprayogo
    ryanprayogo almost 2 years

    I have the following XML:

    <?xml version="1.0" encoding="UTF-8"?>
    <details>
      ...
      <address1>Test&amp;Address</address1>
      ...
    </details>
    

    When I try to unmarshal it using JAXB, it throws the following exception:

    Caused by: org.xml.sax.SAXParseException: The reference to entity "Address" must end with the ';' delimiter.
            at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
            at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
            at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
            at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
            at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
            at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
            at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
            at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
            at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
            at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
            at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
            at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
            at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
            at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)
    

    But when I changed the &amp; in the XML to &apos;, it works. Looks like the problem is only with ampersand &amp; and I cannot understand why.

    The code to unmarshal is:

    JAXBContext context = JAXBContext.newInstance("some.package.name", this.getClass().getClassLoader());
    Unmarshaller unmarshaller = context.createUnmarshaller();
    obj = unmarshaller.unmarshal(new StringReader(xml));
    

    Anyone have some insight?

    EDIT: I tried the solution suggested by @abhin4v below (ie, add a space after &amp;), but it doesn't seem to work too. Here's the stacktrace:

    Caused by: org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
            at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
            at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
            at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
            at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
            at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
            at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
            at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
            at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
            at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
            at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
            at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
            at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
            at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
            at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)