Get XML only immediate children elements by name

java xml parsing dom xml-parsing

97,859

Solution 1

Well, the DOM solution to this question is actually pretty simple, even if it's not too elegant.

When I iterate through the filesNodeList, which is returned when I call notificationElement.getElementsByTagName("file"), I just check whether the parent node's name is "notification". If it isn't then I ignore it because that will be handled by the <group> element. Here's my code solution:

for (int j = 0; j < filesNodeList.getLength(); j++) {
  Element fileElement = (Element) filesNodeList.item(j);
  if (!fileElement.getParentNode().getNodeName().equals("notification")) {
    continue;
  }
  ...
}

Solution 2

I realise you found something of a solution to this in May @kentcdodds but I just had a fairly similar problem which I've now found, I think (perhaps in my usecase, but not in yours), a solution to.

a very simplistic example of my XML format is shown below:-

<?xml version="1.0" encoding="utf-8"?>
<rels>
    <relationship num="1">
        <relationship num="2">
            <relationship num="2.1"/>
            <relationship num="2.2"/>
        </relationship>
    </relationship>
    <relationship num="1.1"/>
    <relationship num="1.2"/>

</rels>

As you can hopefully see from this snippet, the format I want can have N-levels of nesting for [relationship] nodes, so obviously the problem I had with Node.getChildNodes() was that I was getting all nodes from all levels of the hierarchy, and without any sort of hint as to Node depth.

Looking at the API for a while , I noticed there are actually two other methods that might be of some use:-

Together, these two methods seemed to offer everything that was required to get all of the immediate descendant elements of a Node. The following jsp code should give a fairly basic idea of how to implement this. Sorry for the JSP. I'm rolling this into a bean now but didn't have time to create a fully working version from picked apart code.

<%@page import="javax.xml.parsers.DocumentBuilderFactory,
                javax.xml.parsers.DocumentBuilder,
                org.w3c.dom.Document,
                org.w3c.dom.NodeList,
                org.w3c.dom.Node,
                org.w3c.dom.Element,
                java.io.File" %><% 
try {

    File fXmlFile = new File(application.getRealPath("/") + "/utils/forms-testbench/dom-test/test.xml");
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(fXmlFile);
    doc.getDocumentElement().normalize();

    Element docEl = doc.getDocumentElement();       
    Node childNode = docEl.getFirstChild();     
    while( childNode.getNextSibling()!=null ){          
        childNode = childNode.getNextSibling();         
        if (childNode.getNodeType() == Node.ELEMENT_NODE) {         
            Element childElement = (Element) childNode;             
            out.println("NODE num:-" + childElement.getAttribute("num") + "<br/>\n" );          
        }       
    }

} catch (Exception e) {
    out.println("ERROR:- " + e.toString() + "<br/>\n");
}

%>

This code would give the following output, showing only direct child elements of the initial root node.

NODE num:-1
NODE num:-1.1
NODE num:-1.2

Hope this helps someone anyway. Cheers for the initial post.

Solution 3

You can use XPath for this, using two path to get them and process them differently.

To get the <file> nodes direct children of <notification> use //notification/file and for the ones in <group> use //groups/group/file.

This is a simple sample:

public class SO10689900 {
    public static void main(String[] args) throws Exception {
        DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = db.parse(new InputSource(new StringReader("<notifications>\n" + 
                "  <notification>\n" + 
                "    <groups>\n" + 
                "      <group name=\"zip-group.zip\" zip=\"true\">\n" + 
                "        <file location=\"C:\\valid\\directory\\\" />\n" + 
                "        <file location=\"C:\\this\\file\\doesn't\\exist.grr\" />\n" + 
                "        <file location=\"C:\\valid\\file\\here.txt\" />\n" + 
                "      </group>\n" + 
                "    </groups>\n" + 
                "    <file location=\"C:\\valid\\file.txt\" />\n" + 
                "    <file location=\"C:\\valid\\file.xml\" />\n" + 
                "    <file location=\"C:\\valid\\file.doc\" />\n" + 
                "  </notification>\n" + 
                "</notifications>")));
        XPath xpath = XPathFactory.newInstance().newXPath();
        XPathExpression expr1 = xpath.compile("//notification/file");
        NodeList nodes = (NodeList)expr1.evaluate(doc, XPathConstants.NODESET);
        System.out.println("Files in //notification");
        printFiles(nodes);

        XPathExpression expr2 = xpath.compile("//groups/group/file");
        NodeList nodes2 = (NodeList)expr2.evaluate(doc, XPathConstants.NODESET);
        System.out.println("Files in //groups/group");
        printFiles(nodes2);
    }

    public static void printFiles(NodeList nodes) {
        for (int i = 0; i < nodes.getLength(); ++i) {
            Node file = nodes.item(i);
            System.out.println(file.getAttributes().getNamedItem("location"));
        }
    }
}

It should output:

Files in //notification
location="C:\valid\file.txt"
location="C:\valid\file.xml"
location="C:\valid\file.doc"
Files in //groups/group
location="C:\valid\directory\"
location="C:\this\file\doesn't\exist.grr"
location="C:\valid\file\here.txt"

Solution 4

If you stick with the DOM API

NodeList nodeList = doc.getElementsByTagName("notification")
    .item(0).getChildNodes();

// get the immediate child (1st generation)
for (int i = 0; i < nodeList.getLength(); i++)
    switch (nodeList.item(i).getNodeType()) {
        case Node.ELEMENT_NODE:

            Element element = (Element) nodeList.item(i);
            System.out.println("element name: " + element.getNodeName());
            // check the element name
            if (element.getNodeName().equalsIgnoreCase("file"))
            {

                // do something with you "file" element (child first generation)

                System.out.println("element name: "
                    + element.getNodeName() + " attribute: "
                    + element.getAttribute("location"));

            }
    break;

}

Our first task is to get an element "Notification" (in this case the first -item (0)-) and all of its children:

NodeList nodeList = doc.getElementsByTagName("notification")
    .item(0).getChildNodes();

(later you can work with all elements using getting all the elements).

For every child of "Notification":

for (int i = 0; i < nodeList.getLength(); i++)

you first get its type in order to see whether it is an element:

switch (nodeList.item(i).getNodeType()) {
    case Node.ELEMENT_NODE:
        //.......
        break;  
}

If it's the case, then you got your children "file" , that are not grand children "Notification"

and your can check them out:

if (element.getNodeName().equalsIgnoreCase("file"))
{

    // do something with you "file" element (child first generation)

    System.out.println("element name:"
        + element.getNodeName() + " attribute: "
        + element.getAttribute("location"));

}

and the ouptut is:

element name: file
element name:file attribute: C:\valid\file.txt
element name: file
element name:file attribute: C:\valid\file.xml
element name: file
element name:file attribute: C:\valid\file.doc

Solution 5

I had the same problem in one of my projects and wrote a little function which will return a List<Element> containing only the immediate children. Basically it checks for each node returned by getElementsByTagName if it's parentNode is actually the node we are searching childs of:

public static List<Element> getDirectChildsByTag(Element el, String sTagName) {
        NodeList allChilds = el.getElementsByTagName(sTagName);
        List<Element> res = new ArrayList<>();

        for (int i = 0; i < allChilds.getLength(); i++) {
            if (allChilds.item(i).getParentNode().equals(el))
                res.add((Element) allChilds.item(i));
        }

        return res;
    }

The accepted answer by kentcdodds will return wrong results (e.g. grandchilds) if there is a childnode called "notification" - e.g. returning grandchilds when the element "group" would have the name "notification". I was facing that setup in my project, which is why I came up with my function.

View more solutions

97,859

kentcdodds

I am Kent C. Dodds. I work at PayPal as a full stack JavaScript engineer. I host JavaScript Air, the live video broadcast podcast about JavaScript and the web platform. I spend a bit of time on GitHub and Twitter. I'm an Egghead.io instructor. I'm happily married and the father of three kids. I like code. I care about craft, design, and architecture. I like to talk about it. Come chat with me :-)

Updated on July 09, 2022

Comments

kentcdodds almost 2 years
My question is: How can I get elements directly under a specific parent element when there are other elements with the same name as a "grandchild" of the parent element.

I'm using the Java DOM library to parse XML Elements and I'm running into trouble. Here's some (a small portion) of the xml I'm using:
```
<notifications>
  <notification>
    <groups>
      <group name="zip-group.zip" zip="true">
        <file location="C:\valid\directory\" />
        <file location="C:\another\valid\file.doc" />
        <file location="C:\valid\file\here.txt" />
      </group>
    </groups>
    <file location="C:\valid\file.txt" />
    <file location="C:\valid\file.xml" />
    <file location="C:\valid\file.doc" />
  </notification>
</notifications>
```
As you can see, there are two places you can place the <file> element. Either in groups or outside groups. I really want it structured this way because it's more user-friendly.

Now, whenever I call notificationElement.getElementsByTagName("file"); it gives me all the <file> elements, including those under the <group> element. I handle each of these kinds of files differently, so this functionality is not desirable.

I've thought of two solutions:
1. Get the parent element of the file element and deal with it accordingly (depending on whether it's <notification> or <group>.
2. Rename the second <file> element to avoid confusion.
Neither of those solutions are as desirable as just leaving things the way they are and getting only the <file> elements which are direct children of <notification> elements.

I'm open to IMPO comments and answers about the "best" way to do this, but I'm really interested in DOM solutions because that's what the rest of this project is using. Thanks.
- Alex almost 12 years
  
  Why don't you use XPath to get both list of nodes and treat them differently ? //groups/group/file and //notification/file would suffice to have them. Or dou you want only one XPath to get them all ?
- Dmitry almost 12 years
  
  Why not create this collection by you own looping throught direct childs, like hits:"NodeList nodes = element.getChildNodes(); for (int i = 0; i < nodes.getLength(); i++) { //if element path check - add it to the collection }"?
- Charles Duffy almost 12 years
  
  @Alex org.w3c.dom doesn't support XPath; he'd want to use a different library, such as org.jdom.xpath, for that... though I fully agree that it's the more elegant approach.
- Alex almost 12 years
  
  javax.xml.xpath is Java Standard, so I think he can pretty much use it, no need to get JDom just for this simple task.
- kentcdodds almost 12 years
  
  I should mention that this is only a small part of a much bigger xml file :) Wanted to make it readable.
kentcdodds almost 12 years

Looks like a good answer, and in the future I may move from DOM to XPath. But for this project this is the last thing I need to do and I want to stick with DOM. However, unless I get another answer for DOM, I'll accept yours because it's a good answer. Either way, you get a +1 for such a thorough answer.
Alex almost 12 years

If you need to stick with DOM, then you will need to iterate over the NodeList using ((Node)notificationElement).getChildNodes() and keep only the one whose names are file. Ideally you will have to find all notification tags to do that. The same needs to be done for group tags.
kentcdodds almost 12 years

I found a better solution. The reason that wont work is because there are a lot of childNodes in the notification element. I answered the question though. Thanks for your good answer. I really will look into XPath in the future.
kentcdodds almost 12 years

thanks for the solution. My solution is similar to this, but I don't iterate through all the children because there are a lot more children in that element which I didn't display in my question just to avoid information overload. Anyway, thanks again. +1 for a good answer.
arthur almost 12 years

@kentcdodds.I update my Answer.You see,working with XML without using "ID" leaves you basically with only "getElementsByTagName" and "getChildNodes" to play with. You don't have in my opinion other answers when working directly with the DOM.Sorry you have to stick with the DOM.Whatever the solution it will probably come down to how your access the children of a given Node(in this case "Notification").My solution checks the type Node in order to spare you unnecessary work.But you'll still have to iterate ALL the children.That's what happen when there no "ID" : you end up with a collection.
kentcdodds almost 12 years

+1 for providing another totally acceptable answer to the question. :)
BizNuge almost 12 years

Cheers @kentcdodds Quite an interesting problem to tackle and find another solution to actually. fairly glad I can continue to use the org.w3c.dom without having to port existing code. Thanks for the question!
kentcdodds almost 11 years

@JanusTroelsen, if you're talking about the second line when I cast the item as an element, then it depends on the DOM you're parsing... If not, what do you mean?
Tomáš Zato over 10 years

I'm looking for a way to search for an element by path root/etc/foo and eventually create it, or it's parent nodes if these don't exist. Can I use something better than a for loop in children nodes? I only care about the first occurence.
FINDarkside about 9 years

Why didn't you just iterate through element.getChildNodes()?
krispy about 9 years

+1 for a really simple, easy and clean solution. You can use a for loop with this technique, to keep it elegant and to preserve scope: for (Node n = docEl.getFirstChild(); n != null; n = n.getNextSibling()).
klaar over 8 years

@arthur (off-topic) For the love of all that is holy, please put some whitespace between a period and the first letter of the next sentence. This is pure madness!
Justin about 8 years

The 'getParentNode' function (and 'getNodeName') is available on the 'Node' interface. So for just checking the name, no cast is needed. (and just for safety switch the equals to be "notification".equals(...))
ceving over 3 years

What is the difference to getChildNodes?
BizNuge over 3 years

@ceving - I think the problem was getChildNodes was bringing back ALL child nodes from ALL levels of the hierarchy. This was 8 years ago, so the API may well have moved on since that time, but getChildNodes didn't work for either myself or kentcdodds at the time I guess.
ceving over 3 years

getChildNodes does not return all descendants.
ceving over 3 years

XPath is extremely slow. I had a program using XPath for every node selection and it took more than 5 hours to finish. After I had replaced every XPath usage by an equivalent function using getChildNodes, the program finishes in less than 10 minutes.
BizNuge over 3 years

@ceving - NodeList getChildNodes() - A NodeList that contains ALL children of this node. If there are no children, this is a NodeList containing no nodes. Not sure why we're arguing over this point, but this post was 8 years ago. Doesn't seem like a good use of either of our time.
BizNuge over 3 years

@ceving - I couldn't leave it. Just did a quick test and yes, as you suggested, Node.getChildNodes() DOES do exactly what its name suggests now. This definitely didn't work 8 years ago, which I guess would have been a JDK7 version. I'm on JDK8 now I think, so the test I just did might not be against the correct version.