Extracting data from XML using Java

16,536

Solution 1

Xstream wont support in your case, it can be used for convert object to xml then get back again. If your xml is generated from an instance of CampaignFrameResponse class, u can use xstream.

Otherwise you simply check like

String nodeName = currentNode.getNodeName()
String nodeValue = currentNode.getNodeValue() ;
if( nodeName.equals("Message")){
     message = nodeValue ;
} else if( nodeName.equals("FrameHeight") {
     frameHeight = nodeValue ;
}

You need to parse if you need int value.

Solution 2

You can use DOM, SAX, Pull-Parser, but then its good to go with the following APIs.

- JAXP & JAXB

- Castor

Eg: DOM PARSING

DocumentBuilderFactory odbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder odb =  odbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(xml));
            Document odoc = odb.parse(is);
            odoc.getDocumentElement().normalize ();    // normalize text representation
            System.out.println ("Root element of the doc is " + odoc.getDocumentElement().getNodeName());
            NodeList LOP = odoc.getElementsByTagName("response");

                Node FPN =LOP.item(0);
                try{
                if(FPN.getNodeType() == Node.ELEMENT_NODE)
                    {

                    Element token = (Element)FPN;

                    NodeList oNameList1 = token.getElementsByTagName("user_id");
                    Element firstNameElement = (Element)oNameList1.item(0);
                    NodeList textNList1 = firstNameElement.getChildNodes();
                    this.setUser_follower_id(Integer.parseInt(((Node)textNList1.item(0)).getNodeValue().trim()));
                    System.out.println("#####The Parsed data#####");
                    System.out.println("user_id : " + ((Node)textNList1.item(0)).getNodeValue().trim());
                    System.out.println("#####The Parsed data#####");

Solution 3

I have been working with XML in Java for a while (over ten years) and have tried many alternatives (custom text parsing, proprietary APIs, SAX, DOM, Xmlbeans, JAXB, etc.). I have learnt a pair of things:

  • Stick to the standards. Never use a proprietary API but a standard Java API (JAXP, that includes SAX, DOM, Stax, etc.). Your code will be more portable and maintenable and will not change whenever a version of an XML library changes and breaks compatibility (that happens very often).
  • Take your time and do learn XML technologies. I would recommend comprehensive knowledge of at least XSD, XSLT and XPath (needed for XSLT). If you do not have time, then concentrate on XSD.
  • Take advantage of the automatic XML code generation/parsing whenever possible. This implies knowing XSD. It pays off the original effort in the long run, the code is much more maintainable over time, parsing/marsalling is greatly optimized (usually more than if you use the "manual" JAXP APIs) and XML validation (you already have the XSD) can be carried out (less checking code, safety against bad-formed XML crashing your app, less integration efforts). And the best thing, you only write XSD code, almost all the Java code you will need to handle the data (Java Beans) will be generated for you.

Knowadays I tend to use code generation whenever I have to parse some XML like that. The Standard for that is JAXB (xmlbeans is dead and other alternatives may not be as mature or as wideley used). In your case I would define an XSD that defined your document in as fine detail as possible (i.e. if you use a String that can only have several values, do not use "xs:string" type but an enumerated one). It could look like this:

<xs:schema attributeFormDefault="unqualified"
    elementFormDefault="qualified" targetNamespace="http://Qsurv/api"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="CampaignFrameResponse">
        <xs:complexType>
            <xs:sequence>
                <xs:element type="xs:string" name="Message" />
                <xs:element type="Status" name="Status" />
                <xs:element type="xs:short" name="FrameHeight" />
                <xs:element type="xs:anyURI" name="FrameUrl" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>

    <<xs:simpleType name="Status">
        <xs:annotation>
            <xs:appinfo>
                <jaxb:typesafeEnumClass>
                    <jaxb:typesafeEnumMember name="SUCCESS"
                        value="Success" />
                    <jaxb:typesafeEnumMember name="FAILURE"
                        value="Failure" />
                </jaxb:typesafeEnumClass>
            </xs:appinfo>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:enumeration value="Success" />
            <xs:enumeration value="Failure" />
        </xs:restriction>
    </xs:simpleType>
</xs:schema>

Now it is a matter of using JAXB tools (see xjc compiler options) to generate code and see a pair examples about how to marshal/unmarshal the generated Java Beans from/to XML.

Solution 4

You could of course create a name-value map and update the map as you traverse the XML. At the end of the parsing you could look for the particular key in the map. Java doesn't let you create variables programmatically so you won't be able to generate a variable with its name based on the XML data.

Other than for style and readability, your decision to populate data-structures from XML depends on how well-defined the XML is and how much would its schema could possibly change in future. You could ask yourself questions like : Can the node-name change in future? Can XML subsections be introduced that would circumscribe this section? This might help you choose a certain parser (SAX/DOM or higher-level object-parsing APIs).

Of course, if you have no control on the XML definition there is little you can do other than parsing what you've got.

Share:
16,536
Victoria
Author by

Victoria

Junior Java Developer

Updated on June 05, 2022

Comments

  • Victoria
    Victoria almost 2 years

    I have the following XML code:

    <CampaignFrameResponse
      xmlns="http://Qsurv/api"
      xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
      <Message>Success</Message>
      <Status>Success</Status>
      <FrameHeight>308</FrameHeight>   
      <FrameUrl>http://delivery.usurv.com?Key=a5018c85-222a-4444-a0ca-b85c42f3757d&amp;ReturnUrl=http%3a%2f%2flocalhost%3a8080%2feveningstar%2fhome</FrameUrl> 
    </CampaignFrameResponse>
    

    What I'm trying to do is extract the nodes and assign them to a variable. So for example, I'd have a variable called FrameHeight containing the value 308.

    This is the Java code I have so far:

    private void processNode(Node node) {
        NodeList nodeList = node.getChildNodes();
        for (int i = 0; i < nodeList.getLength(); i++) {
            Node currentNode = nodeList.item(i);
           if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
                //calls this method for all the children which is Element
                LOG.warning("current node name: " + currentNode.getNodeName());
                LOG.warning("current node type: " + currentNode.getNodeType());
                LOG.warning("current node value: " + currentNode.getNodeValue());
                processNode(currentNode);
           }
        }
    
    }
    

    This prints out the node names, types and values, but what is the best way of assigning each of the values to an appropriately-named variable? eg int FrameHeight = 308?

    This is my updated code where the nodeValue variable keeps returning null:

    processNode(Node node) {
    NodeList nodeList = node.getChildNodes();
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node currentNode = nodeList.item(i);
        if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
            //calls this method for all the children which is Element
            String nodeName = currentNode.getNodeName();
            String nodeValue = currentNode.getNodeValue();
            if(nodeName.equals("Message")) {
                LOG.warning("nodeName: " + nodeName); 
                message = nodeValue;
                LOG.warning("Message: " + message); 
            } 
            else if(nodeName.equals("FrameHeight")) {
                LOG.warning("nodeName: " + nodeName); 
                frameHeight = nodeValue;
                LOG.warning("frameHeight: " + frameHeight);
            }
            processNode(currentNode);
        }
    }
    

    }

  • Victoria
    Victoria over 11 years
    Thanks for this. For some reason, the nodeValue variable keeps returning null. I've added my updated code to the end of the original question.
  • marcolopes
    marcolopes over 9 years
    Why do you say "xmlbeans is dead"?