How do I turn off validation when parsing well-formed XML using DocumentBuilder.parse?
20,127
Solution 1
Probably related to an EntityResolver issue. You can take a look here:
How to read well formed XML in Java, but skip the schema?
Solution 2
Here is how you create a DocumentBuilder that will ignore ALL external referenced entities, including DTDs:
final DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId) {
// it might be a good idea to insert a trace logging here that you are ignoring publicId/systemId
return new InputSource(new StringReader("")); // Returns a valid dummy source
}
});
Author by
Dave
Updated on June 11, 2020Comments
-
Dave almost 4 years
I'm using Java 6. I want to parse XHTML that I know is well-formed. As such, I don't want to do any validation against DTD's or other schemas referenced in the doc. However, I'm having trouble figuring out how to turn that validation off. I have
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(false); final DocumentBuilder b = factory.newDocumentBuilder(); final InputSource s = new InputSource(new StringReader(str)); org.w3c.dom.Document result = b.parse(s);
But I still get an exception on the last line ...
java.net.SocketException: Unexpected end of file from server at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:777) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:774) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:677) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1194) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1090) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1003) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at com.myco.myproj.util.XmlUtilities.getStringAsDocument(XmlUtilities.java:130) at com.myco.myproj.util.NetUtilities.getUrlAsDocument(NetUtilities.java:30) at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.parsePage(AbstractChicagoReaderParser.java:144) at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.getEvents(AbstractChicagoReaderParser.java:112) at com.myco.myproj.parsers.impl.ChicagoReaderParserTest.testParser(ChicagoReaderParserTest.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
I don't want my parser going to the Internet. How do I disable that? Thanks, - Dave
Edit: Per Traroth's suggestion, I tried the below code, but get the same exception
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(false); final DocumentBuilder builder = factory.newDocumentBuilder(); builder.setEntityResolver(new EntityResolver() { @Override public InputSource resolveEntity(String publicId, String systemId) { return null; } }); final InputSource s = new InputSource(new StringReader(str)); org.w3c.dom.Document result = builder.parse(s);
-
Dave about 12 yearsI assume you're talking about the accepted answer in that thread. I tried the suggestion (my code is included in the edit), but I still get the same exception. Am I missing something?
-
Alohci about 12 years@Dave - Yes. Returning null is the same as not applying an entity resolver. You need to follow the other path of returning an Input source. Just change "foo.dtd" to the system id of the DTD for your XML file.
-
Alexis Dufrenoy about 12 yearsYes, if you look at the sample code in the answer, you will see the resolveEntity() method returns an actual non-null InputSource if the XML refers to a DTD, and null else.
-
Dave about 12 yearsI understand what you're saying with the example. My question, though, is that I don't want any validation/DTD parsing. I just want a bunch of elements. Can I get the parser to ignore any DTD references?