Can SAX Parsers use XPath in Java?

15,310

Solution 1

Simply using a SAX parser will not build a representation of your XML tree in memory (this is why SAX is more memory-efficient). It will only trigger "events" whenever a new XML element is encountered. You will have to keep the context (often a stack of parent elements) in memory to "know" where you are in the tree.

Since you will not have a tree in memory, you will not be able to use XPath. You can only test for the current "context" (your manuallay managed stack) to query your document.Remember that the SAX parser will only do one run on your file, so order in the file is important.

Fortunately, there are other approach like VTD-XML which is a library that build the XML tree in memory, but only the structure part, it does not extract the actual content from the file, the content is extracted as-needed. It is much more memory efficient than a DOM parser while still allowing XPath. I personnaly use this library at work to parse ~700MB XML files with XPath (yes that's insane but it works and it is very fast.)

Solution 2

IMHO the easiest way to process XML is to use StAX, the Streaming API for XML. It combines the advantages of DOM and SAX (and offers an easier migration to you). You still have a cursor to an XML element (like in SAX), but your code moves the cursor forward. This gives the great advantage that XML processing code becomes much more readable. It also solves the memory issue as only the current XML element has to be held in the memory. Here's also a nice tutorial.

To also answer your original question: A short search on Google showed to me there is no easy, widely accepted way which probably means that all custom solutions are not robust, not maintained and not well-tested.

Solution 3

Switching to SAX parsing (or StAX) will require a complete change in your approach. It looks as if you haven't fully appreciated how much work it will be. For any advice to make sense, we need to know how big the file is, and what kind of processing you want to do with the data. If you are filtering the data, for example, then an XQuery implementation that uses document projection might be a good answer (this will automatically use SAX behind the scenes to build a tree containing only the subset of the data that you are actually interested in).

Share:
15,310
Admin
Author by

Admin

Updated on June 17, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm trying to migrate one of my classes which uses DOM parsing with lots of XPath expressions to SAX parsing. DOM Parsing was good for me but some of the files i try to parse are too big and they cause server timeouts. I want to reuse the XPath with the SAX parsing but i'm not sure if it is possible and if not possible could you please help me because i have no idea how the following code will be when i use only SAX:

    Document doc = bpsXml.getDocument();
    String supplierName = BPSXMLUtils.getXpathString(doc, "/Invoice/InvoiceHeader/Party[@stdValue='SU']/Name/Name1");
    String language = BPSXMLUtils.getXpathString(doc, "/Invoice/InvoiceHeader/InvoiceLanguage/@stdValue");
    
  • OGrandeDiEnne
    OGrandeDiEnne about 8 years
    It's a very nice idea but does not work that well when you have big files (2+ GB) made of lots of XML elements with short data. In practice you'd reduce the required memory of 50/60 %. Which is great, but not enough when you have very big files. And nowadays data is growing faster and faster...
  • Vincent Robert
    Vincent Robert about 8 years
    After all these years, I did parse 2GB+ files with VTD-XML. It does a very good job and memory is not an issue at all. Did you try it and had a bad experience? Can you share more information?
  • OGrandeDiEnne
    OGrandeDiEnne about 8 years
    How much memory (-Xmx) did you allocate to the parsing program ?
  • Vincent Robert
    Vincent Robert about 8 years
    Most of my parsing worked with the default Xmx (64 MB) but I sometimes had to push it up to 1GB just for safety (from memory, I no longer work on these topics).