Python minidom/xml : How to set node text with minidom api

15,597

actually minidom is no more difficult to use than other dom parsers, if you dont like it you may want to consider complaining to the w3c

from xml.dom.minidom import parseString

XML = """
<nodeA>
    <nodeB>Text hello</nodeB>
    <nodeC><noText></noText></nodeC>
</nodeA>
"""


def replaceText(node, newText):
    if node.firstChild.nodeType != node.TEXT_NODE:
        raise Exception("node does not contain text")

    node.firstChild.replaceWholeText(newText)

def main():
    doc = parseString(XML)

    node = doc.getElementsByTagName('nodeB')[0]
    replaceText(node, "Hello World")

    print doc.toxml()

    try:
        node = doc.getElementsByTagName('nodeC')[0]
        replaceText(node, "Hello World")
    except:
        print "error"


if __name__ == '__main__':
    main()
Share:
15,597
Warren  P
Author by

Warren P

Software Developer: Web, Desktop, and Server service developer JavaScript (ExtJS/SenchaTouch/jquery), HTML5, CSS C#/ASP.NET MVC/ASP.NET Core/Roslyn/Visual Studio 2015 Delphi/ObjectPascal C (Linux and others) and Objective-C (have an app in the app store) Python C++ Smalltalk/Pharo/Squeak Embedded Systems, and cross-platform stuff (Windows, Linux, Mac OS X) नमस्ते. मैं हिंदी फिल्मों से प्यार है. Bits of code at Bitbucket hosting: https://bitbucket.org/wpostma More bits of code at Github: https://github.com/wpostma

Updated on June 19, 2022

Comments

  • Warren  P
    Warren P almost 2 years

    I am currently trying to load an xml file and modify the text inside a pair of xml tags, like this:

       <anode>sometext</anode>
    

    I currently have a helper function called getText that I use to get the text sometext above. Now I need to modify the childnodes I guess, inside the node to modify a node that has the XML snippet shown above, to change sometext to othertext. The common API patch getText function is shown below in the footnote.

    So my question is, that's how we get the text, how do I write a companion helper function called setText(node,'newtext'). I'd prefer if it operated on the node level, and found its way down to the childnodes all on its own, and worked robustly.

    A previous question has an accepted answer that says "I'm not sure you can modify the DOM in place". Is that really true? Is Minidom so broken that it's effectively Read Only?


    By way of footnote, to read text between <anode> and </anode>, I took was surprised no direct simple single minidom function exists, and that this small helper function is suggested in the Python xml tutorials:

    import xml.dom.minidom
    
    def getText(nodelist):
        rc = []
        for node in nodelist:
            if node.nodeType == node.TEXT_NODE:
                rc.append(node.data)
        return ''.join(rc)
    
    # I've added this bit to make usage of the above clearer
    def getTextFromNode(node):
       return getText(node.childNodes)
    

    Elsewhere in StackOverflow, I see this accepted answer from 2008:

       node[0].firstChild.nodeValue
    

    If that's how hard it is to read with minidom, I'm not suprised to see that people say "Just don't do it!" when you ask how to write things that might modify the Node structure of your XML document.

    Update The answer below shows it's not as hard as I thought.