Validating a HUGE XML file

12,112

Solution 1

Instead of using a DOMParser, use a SAXParser. This reads from an input stream or reader so you can keep the XML on disk instead of loading it all into memory.

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource(new FileReader ("document.xml")));

Solution 2

Use libxml, which performs validation and has a streaming mode.

Solution 3

Personally I like to use XMLStarlet which has a command line interface, and works on streams. It is a set of tools built on Libxml2.

Solution 4

SAX and libXML will help, as already mentioned. You could also try increasing the maximum heap size for the JVM using the -Xmx option. E.g. to set the maximum heap size to 512MB: java -Xmx512m com.foo.MyClass

Share:
12,112
Dan Cramer
Author by

Dan Cramer

Father of Three

Updated on August 01, 2022

Comments

  • Dan Cramer
    Dan Cramer almost 2 years

    I'm trying to find a way to validate a large XML file against an XSD. I saw the question ...best way to validate an XML... but the answers all pointed to using the Xerces library for validation. The only problem is, when I use that library to validate a 180 MB file then I get an OutOfMemoryException.

    Are there any other tools,libraries, strategies for validating a larger than normal XML file?

    EDIT: The SAX solution worked for java validation, but the other two suggestions for the libxml tool were very helpful as well for validation outside of java.