Walk through all XML nodes in an element-nested structure

13,198

Solution 1

This is probably best achieved with a recursive function. Something like this should do it but I've not tested it so consider it pseudocode.

import xml.dom.minidom as  md

def print_node(root):
    if root.childNodes:
        for node in root.childNodes:
           if node.nodeType == node.ELEMENT_NODE:
               print node.tagName,"has value:",  node.nodeValue, "and is child of:", node.parentNode.tagName
               print_node(node)

dom = md.parse("ASL.xml")
root = dom.documentElement
print_node(root)

Solution 2

If it's not important to use xml.dom.minidom:

import xml.etree.ElementTree as ET
tree = ET.fromstring("""...""")
for  elt in tree.iter():
    print "%s: '%s'" % (elt.tag, elt.text.strip())

Output:

program: ''
type: 'Program'
body: ''
type: 'VariableDeclaration'
declarations: ''
type: 'VariableDeclarator'
id: ''
type: 'Identifier'
name: 'answer'
init: ''
type: 'BinaryExpression'
operator: '*'
left: ''
type: 'Literal'
value: '6'
right: ''
type: 'Literal'
value: '7'
kind: 'var'

Solution 3

For the 2.6+ equivalent of kalgasnik's Elementree code, just replace iter() with getiterator():

import xml.etree.ElementTree as ET
tree = ET.fromstring("""...""")
for  elt in tree.getiterator():
    print "%s: '%s'" % (elt.tag, elt.text.strip())
Share:
13,198
Eduard Florinescu
Author by

Eduard Florinescu

Coding my way out of boredom. “If the fool would persist in his folly he would become wise.” (William Blake)

Updated on June 11, 2022

Comments

  • Eduard Florinescu
    Eduard Florinescu almost 2 years

    I have this kind of XML structure (output from the Esprima ASL converted from JSON), it can get even more nested than this (ASL.xml):

    <?xml version="1.0" encoding="UTF-8" ?>
        <program>
        <type>Program</type>
        <body>
            <type>VariableDeclaration</type>
            <declarations>
                <type>VariableDeclarator</type>
                <id>
                    <type>Identifier</type>
                    <name>answer</name>
                </id>
                <init>
                    <type>BinaryExpression</type>
                    <operator>*</operator>
                    <left>
                        <type>Literal</type>
                        <value>6</value>
                    </left>
                    <right>
                        <type>Literal</type>
                        <value>7</value>
                    </right>
                </init>
            </declarations>
            <kind>var</kind>
        </body>
        </program> 
    

    Usualy for XML I use the for node inroot.childNodes` but this works only for the direct children:

    import xml.dom.minidom as  md
    dom = md.parse("ASL.xml")
    root = dom.documentElement
    for node in root.childNodes:
        if node.nodeType == node.ELEMENT_NODE:
            print node.tagName,"has value:",  node.nodeValue:, "and is child of:",node.parentNode.tagName 
    

    How can I walk all the elements of the XML regardless how many nested elements are?