stax - get xml node as string

java xml extract stax

14,968

Solution 1

Why not just use xpath for this?

You could have a fairly simple xpath to get all 'statement' nodes.

Like so:

//statement

EDIT #1: If possible, take a look at dom4j. You could read the String and get all 'statement' nodes fairly simply.

EDIT #2: Using dom4j, this is how you would do it: (from their cookbook)

String text = "your xml here";
Document document = DocumentHelper.parseText(text);

public void bar(Document document) {
   List list = document.selectNodes( "//statement" );
   // loop through node data
}

Solution 2

I had a similar task and although the original question is older than a year, I couldn't find a satisfying answer. The most interesting answer up to now was Blaise Doughan's answer, but I couldn't get it running on the XML I am expecting (maybe some parameters for the underlying parser could change that?). Here the XML, very simplyfied:

<many-many-tags>
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
</many-many-tags>

My solution:

public static String readElementBody(XMLEventReader eventReader)
    throws XMLStreamException {
    StringWriter buf = new StringWriter(1024);

    int depth = 0;
    while (eventReader.hasNext()) {
        // peek event
        XMLEvent xmlEvent = eventReader.peek();

        if (xmlEvent.isStartElement()) {
            ++depth;
        }
        else if (xmlEvent.isEndElement()) {
            --depth;

            // reached END_ELEMENT tag?
            // break loop, leave event in stream
            if (depth < 0)
                break;
        }

        // consume event
        xmlEvent = eventReader.nextEvent();

        // print out event
        xmlEvent.writeAsEncodedUnicode(buf);
    }

    return buf.getBuffer().toString();
}

Usage example:

XMLEventReader eventReader = ...;
while (eventReader.hasNext()) {
    XMLEvent xmlEvent = eventReader.nextEvent();
    if (xmlEvent.isStartElement()) {
        StartElement elem = xmlEvent.asStartElement();
        String name = elem.getName().getLocalPart();

        if ("DESCRIPTION".equals(name)) {
            String xmlFragment = readElementBody(eventReader);
            // do something with it...
            System.out.println("'" + fragment + "'");
        }
    }
    else if (xmlEvent.isEndElement()) {
        // ...
    }
}

Note that the extracted XML fragment will contain the complete extracted body content, including white space and comments. Filtering those on demand, or making the buffer size parametrizable have been left out for code brevity:

'
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
    '

Solution 3

You can use StAX for this. You just need to advance the XMLStreamReader to the start element for statement. Check the account attribute to get the file name. Then use the javax.xml.transform APIs to transform the StAXSource to a StreamResult wrapping a File. This will advance the XMLStreamReader and then just repeat this process.

import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer t = tf.newTransformer();
            File file = new File("out" + xsr.getAttributeValue(null, "account") + ".xml");
            t.transform(new StAXSource(xsr), new StreamResult(file));
        }
    }

}

Solution 4

Stax is a low-level access API, and it does not have either lookups or methods that access content recursively. But what you actually trying to do? And why are you considering Stax?

Beyond using a tree model (DOM, XOM, JDOM, Dom4j), which would work well with XPath, best choice when dealing with data is usually data binding library like JAXB. With it you can pass Stax or SAX reader and ask it to bind xml data into Java beans and instead of messing with xml process Java objects. This is often more convenient, and it is usually quite performance. Only trick with larger files is that you do not want to bind the whole thing at once, but rather bind each sub-tree (in your case, one 'statement' at a time). This is easiest done by iterating Stax XmlStreamReader, then using JAXB to bind.

Solution 5

I've been googling and this seems painfully difficult.

given my xml I think it might just be simpler to:

StringBuilder buffer = new StringBuilder();
for each line in file {
   buffer.append(line)
   if(line.equals(STMT_END_TAG)){
      parse(buffer.toString())
      buffer.delete(0,buffer.length)
   }
 }

 private void parse(String statement){
    //saxParser.parse( new InputSource( new StringReader( xmlText ) );
    // do stuff
    // save string
 }

View more solutions

14,968

Author by

Jason

I make it do what it do baby

Updated on June 09, 2022

Comments

Jason almost 2 years
xml looks like so:
```
<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>
</statements>
```
I'm using stax to process one "<statement>" at a time and I got that working. I need to get that entire statement node as a string so I can create "123.xml" and "456.xml" or maybe even load it into a database table indexed by account.

using this approach: http://www.devx.com/Java/Article/30298/1954

I'm looking to do something like this:
```
String statementXml = staxXmlReader.getNodeByName("statement");

//load statementXml into database
```
- javamonkey79 over 13 years
  
  What is your question exactly?
bdoughan over 13 years

There area also standard XPath libraries in the JDK/JRE: stackoverflow.com/questions/3939636/…
Paul Whitehurst almost 13 years

Using while (xsr.nextTag...) will fail. The stax documentation for xsr.nextTag() states that an exception will be thrown if xsr.hasNext() is false and next tag is called. Also, when using xsr.nextTag(), if other than white space characters, COMMENT, PROCESSING_INSTRUCTION, START_ELEMENT, END_ELEMENT are encountered, an exception is thrown.
t0r0X almost 12 years

The poster explicitly mentioned StAX, so I don't think pointers to dom4j or other library did help him much...
james.garriss almost 12 years

Given that the OP never asked a question, the suggestion to use xPath is as good as anything. Maybe better.
Srinivas about 9 years

When I use the above code, I am getting the following error Exception in thread "main" net.sf.saxon.trans.XPathException: org.w3c.dom.DOMException: HIERARCHY_REQUEST_ERR: An attempt was made to insert a node where it is not permitted. Any Idea?
pup784 over 7 years

Is there way to print the string without the namespace?
t0r0X over 7 years

I'm not sure I understand your question, what namespace? Can you give an example?
Ermal about 3 years

Conceptually calling xsr.nextTag() is wrong since XMLStreamReader maybe starts already from the right tag if "input.xml" does not contain headers. Trying all possible cases I always receive the error: java.lang.IllegalStateException: Attempt to output end tag with no matching start tag. @t0r0X solution is the only one valid for me