How do I turn off validation when parsing well-formed XML using DocumentBuilder.parse?

20,127

Solution 1

Probably related to an EntityResolver issue. You can take a look here:

How to read well formed XML in Java, but skip the schema?

Solution 2

Here is how you create a DocumentBuilder that will ignore ALL external referenced entities, including DTDs:

final DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
    @Override
        public InputSource resolveEntity(String publicId, String systemId) {
                // it might be a good idea to insert a trace logging here that you are ignoring publicId/systemId
                return new InputSource(new StringReader("")); // Returns a valid dummy source
        }
    });
Share:
20,127
Dave
Author by

Dave

Updated on June 11, 2020

Comments

  • Dave
    Dave almost 4 years

    I'm using Java 6. I want to parse XHTML that I know is well-formed. As such, I don't want to do any validation against DTD's or other schemas referenced in the doc. However, I'm having trouble figuring out how to turn that validation off. I have

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(false);
        final DocumentBuilder b = factory.newDocumentBuilder();
        final InputSource s = new InputSource(new StringReader(str));
        org.w3c.dom.Document result = b.parse(s);
    

    But I still get an exception on the last line ...

    java.net.SocketException: Unexpected end of file from server
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:777)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:774)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:677)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282)
        at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1194)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1090)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1003)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
        at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
        at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
        at com.myco.myproj.util.XmlUtilities.getStringAsDocument(XmlUtilities.java:130)
        at com.myco.myproj.util.NetUtilities.getUrlAsDocument(NetUtilities.java:30)
        at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.parsePage(AbstractChicagoReaderParser.java:144)
        at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.getEvents(AbstractChicagoReaderParser.java:112)
        at com.myco.myproj.parsers.impl.ChicagoReaderParserTest.testParser(ChicagoReaderParserTest.java:29)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
        at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
    

    I don't want my parser going to the Internet. How do I disable that? Thanks, - Dave

    Edit: Per Traroth's suggestion, I tried the below code, but get the same exception

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(false);
        final DocumentBuilder builder = factory.newDocumentBuilder();
        builder.setEntityResolver(new EntityResolver() {
            @Override
                public InputSource resolveEntity(String publicId, String systemId) {
                        return null;
                }
            });
        final InputSource s = new InputSource(new StringReader(str));
        org.w3c.dom.Document result = builder.parse(s);
    
  • Dave
    Dave about 12 years
    I assume you're talking about the accepted answer in that thread. I tried the suggestion (my code is included in the edit), but I still get the same exception. Am I missing something?
  • Alohci
    Alohci about 12 years
    @Dave - Yes. Returning null is the same as not applying an entity resolver. You need to follow the other path of returning an Input source. Just change "foo.dtd" to the system id of the DTD for your XML file.
  • Alexis Dufrenoy
    Alexis Dufrenoy about 12 years
    Yes, if you look at the sample code in the answer, you will see the resolveEntity() method returns an actual non-null InputSource if the XML refers to a DTD, and null else.
  • Dave
    Dave about 12 years
    I understand what you're saying with the example. My question, though, is that I don't want any validation/DTD parsing. I just want a bunch of elements. Can I get the parser to ignore any DTD references?