Handling Empty Nodes Using Java DOM

13,473

Solution 1

if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {

nodeValue shouldn't be null; it would be firstChild itself that might be null and should be checked for:

Node firstChild= xmlNodeList.item(j).getFirstChild();
arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());

However note that this is still sensitive to the content being only one text node. If you had an element with another element in, or some text and a CDATA section, just getting the value of the first child isn't enough to read the whole text.

What you really want is the textContent property from DOM Level 3 Core, which will give you all the text inside the element, however contained.

arrInputStrings.add(xmlNodeList.item(j).getTextContent());

This is available in Java 1.5 onwards.

Solution 2

You could use a library like jOOX to generally simplify standard DOM manipulation. With jOOX, you'd get the list of strings as such:

List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST)
                                    .find(XML_INPUT_STRING)
                                    .texts();
Share:
13,473
MysteryMoose
Author by

MysteryMoose

Updated on July 26, 2022

Comments

  • MysteryMoose
    MysteryMoose almost 2 years

    I have a question concerning XML, Java's use of DOM, and empty nodes. I am currently working on a project wherein I take an XML descriptor file of abstract machines (for text parsing) and parse a series of input strings with them. The actual building and interpretation of these abstract machines is all done and working fine, but I have come across a rather interesting XML requirement. Specifically, I need to be able to turn an empty InputString node into an empty string ("") and still execute my parsing routines. The problem, however, occurs when I attempt to extract this blank node from my XML tree. This causes a null pointer exception and then generally bad things start happening. Here is the offending snippet of XML (Note the first element is empty):

        <InputStringList>
            <InputString></InputString>
            <InputString>000</InputString>
            <InputString>111</InputString>
            <InputString>01001</InputString>
            <InputString>1011011</InputString>
            <InputString>1011000</InputString>
            <InputString>01010</InputString>
            <InputString>1010101110</InputString>
        </InputStringList>
    

    I extract my strings from the list using:

    //Get input strings to be validated
    xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0);
    xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING);
    for (int j = 0; j < xmlNodeList.getLength(); j++) {
    
        //Add input string to list
        if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
            arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue());
    
        } else {
            arrInputStrings.add("");
    
        }
    }
    

    How should I handle this empty case? I have found a lot of information on removing blank text nodes, but I still actually have to parse the blank nodes as empty strings. Ideally, I would like to avoid using a special character to denote a blank string.

    Thank you in advance for your time.