Handling Empty Nodes Using Java DOM
Solution 1
if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
nodeValue
shouldn't be null; it would be firstChild
itself that might be null and should be checked for:
Node firstChild= xmlNodeList.item(j).getFirstChild();
arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());
However note that this is still sensitive to the content being only one text node. If you had an element with another element in, or some text and a CDATA section, just getting the value of the first child isn't enough to read the whole text.
What you really want is the textContent
property from DOM Level 3 Core, which will give you all the text inside the element, however contained.
arrInputStrings.add(xmlNodeList.item(j).getTextContent());
This is available in Java 1.5 onwards.
Solution 2
You could use a library like jOOX to generally simplify standard DOM manipulation. With jOOX, you'd get the list of strings as such:
List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST)
.find(XML_INPUT_STRING)
.texts();
MysteryMoose
Updated on July 26, 2022Comments
-
MysteryMoose almost 2 years
I have a question concerning XML, Java's use of DOM, and empty nodes. I am currently working on a project wherein I take an XML descriptor file of abstract machines (for text parsing) and parse a series of input strings with them. The actual building and interpretation of these abstract machines is all done and working fine, but I have come across a rather interesting XML requirement. Specifically, I need to be able to turn an empty InputString node into an empty string ("") and still execute my parsing routines. The problem, however, occurs when I attempt to extract this blank node from my XML tree. This causes a null pointer exception and then generally bad things start happening. Here is the offending snippet of XML (Note the first element is empty):
<InputStringList> <InputString></InputString> <InputString>000</InputString> <InputString>111</InputString> <InputString>01001</InputString> <InputString>1011011</InputString> <InputString>1011000</InputString> <InputString>01010</InputString> <InputString>1010101110</InputString> </InputStringList>
I extract my strings from the list using:
//Get input strings to be validated xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0); xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING); for (int j = 0; j < xmlNodeList.getLength(); j++) { //Add input string to list if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) { arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue()); } else { arrInputStrings.add(""); } }
How should I handle this empty case? I have found a lot of information on removing blank text nodes, but I still actually have to parse the blank nodes as empty strings. Ideally, I would like to avoid using a special character to denote a blank string.
Thank you in advance for your time.