Reading a big XML file using stax and dom
Solution 1
You could use a StAX (javax.xml.stream
) parser and transform (javax.xml.transform
) each section to a DOM node (org.w3c.dom
):
import java.io.*;
import javax.xml.stream.*;
import javax.xml.transform.*;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.dom.DOMResult;
import org.w3c.dom.*
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
xsr.nextTag(); // Advance to statements element
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
DOMResult result = new DOMResult();
t.transform(new StAXSource(xsr), result);
Node domNode = result.getNode();
}
}
}
Also see:
Solution 2
Blaise Doughan's answer fails in clean java 7 and 8 due to https://bugs.openjdk.java.net/browse/JDK-8016914
java.lang.NullPointerException
at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.setXmlVersion(CoreDocumentImpl.java:860)
at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.setDocumentInfo(SAX2DOM.java:144)
Funny thing: if you use jaxb unmarshaller, you don't get the NPE:
package com.common.config;
import java.io.*;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.Unmarshaller;
import javax.xml.stream.*;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
// Advance to root element
xsr.nextTag(); // TODO: nextTag() can't skip DTD
xsr.next(); // Advance to first item or EOD
final JAXBContext jaxbContext = JAXBContext.newInstance();
final Unmarshaller unm = jaxbContext.createUnmarshaller();
while(true) {
// previous unmarshal() already did advance to next element or whitespace
if (xsr.getEventType() == XMLStreamReader.START_ELEMENT) {
JAXBElement<Object> jel = unm.unmarshal(xsr, Object.class);
Node domNode = (Node)jel.getValue();
System.err.println(domNode.getNodeName());
} else if (!xsr.hasNext()) {
break;
} else {
xsr.next();
}
}
}
}
The reason is: com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXConnector$1
does not implement Locator2
therefore it has no getXMLVersion()
.
Noam
Updated on June 07, 2022Comments
-
Noam almost 2 years
I need to read several big (200Mb-500Mb) XML files, so I want to use StaX. My system has two modules - one to read the file ( with StaX ); another module ( 'parser' module ) suppose to get a single entry of that XML and parse it using DOM. My XML files don't have a certain structure - so I cannot use JaxB. How can I pass the 'parser' module a specific entry that I want it to parse? For example:
<Items> <Item> <name> .... </name> <price> ... </price> </Item> <Item> <name> .... </name> <price> ... </price> </Item> </Items>
I want to use StaX to parse that file - but each 'item' entry will be passed to the 'parser' module.
Edit:
After a little more reading - I think I need a library that reads an XML file using stream - but parse each entry using DOM. Is there such a thing?