getNodeName() operation on an XML node returns #text

10,526

setIgnoringElementContentWhitespace only works if you use setValidating(true), and then only if the XML file you are parsing references a DTD that the parser can use to work out which whitespace-only text nodes are actually ignorable. If your document doesn't have a DTD it errs on the safe side and assumes that no text nodes can be ignored, so you'll have to write your own code to ignore them as you traverse the child nodes.

Share:
10,526
coder
Author by

coder

Updated on July 20, 2022

Comments

  • coder
    coder almost 2 years
    <person>
    <firstname>
    <lastname>
    <salary>
    </person>
    

    This is the XML I am parsing. When I try printing the node names of child elements of person, I get

    text

    firstname

    text

    lastname

    text

    salary

    How do I eliminate #text being generated?

    Update - Here is my code

    try {
    
        NodeList nl = null;
        int l, i = 0;
        File fXmlFile = new File("file.xml");
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        dbFactory.setValidating(false);
        dbFactory.setIgnoringElementContentWhitespace(true);
        dbFactory.setNamespaceAware(true);
        dbFactory.setIgnoringComments(true);
    
        dbFactory.setCoalescing(true);
    
    
        InputStream in;
        in = new FileInputStream(fXmlFile);
        Document doc = dBuilder.parse(in);
        doc.getDocumentElement().normalize();
        Node n = doc.getDocumentElement();
    
        System.out.println(dbFactory.isIgnoringElementContentWhitespace());
        System.out.println(n);
    
        if (n != null && n.hasChildNodes()) {
            nl = n.getChildNodes();
    
            for (i = 0; i < nl.getLength(); i++) {
                System.out.println(nl.item(i).getNodeName());
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    
  • coder
    coder over 11 years
    Thank you very much for your response. Which method is preferable? Writing a DTD or writing a method to ignore white spaces?
  • Ian Roberts
    Ian Roberts over 11 years
    It's not hard to strip out whitespace-only text nodes post-hoc (e.g. java.net/node/667186#comment-684625) and this avoids the need to modify the original XML file to add the DTD reference.
  • coder
    coder over 11 years
    This is awesome! Thanks a lot!
  • Ian Roberts
    Ian Roberts over 11 years
    If the answer worked for you, please consider accepting it by clicking the green tick mark to the left.