ElementTree - findall to recursively select all child elements

25,752

Solution 1

Quoting findall,

Element.findall() finds only elements with a tag which are direct children of the current element.

Since it finds only the direct children, we need to recursively find other children, like this

>>> import xml.etree.ElementTree as ET
>>> 
>>> def find_rec(node, element, result):
...     for item in node.findall(element):
...         result.append(item)
...         find_rec(item, element, result)
...     return result
... 
>>> find_rec(ET.parse("h.xml"), 'saybye', [])
[<Element 'saybye' at 0x7f4fce206710>, <Element 'saybye' at 0x7f4fce206750>, <Element 'saybye' at 0x7f4fce2067d0>]

Even better, make it a generator function, like this

>>> def find_rec(node, element):
...     for item in node.findall(element):
...         yield item
...         for child in find_rec(item, element):
...             yield child
... 
>>> list(find_rec(ET.parse("h.xml"), 'saybye'))
[<Element 'saybye' at 0x7f4fce206a50>, <Element 'saybye' at 0x7f4fce206ad0>, <Element 'saybye' at 0x7f4fce206b10>]

Solution 2

From version 2.7 on, you can use xml.etree.ElementTree.Element.iter:

import xml.etree.ElementTree as ET
root = ET.parse("h.xml")
print root.iter('saybye')

See 19.7. xml.etree.ElementTree — The ElementTree XML API

Solution 3

If you aren't afraid of a little XPath, you can use the // syntax that means find any descendant node:

import xml.etree.ElementTree as ET
root = ET.parse("h.xml")
print(root.findall('.//saybye'))

Full XPath isn't supported, but here's the list of what is: https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax

Share:
25,752
Admin
Author by

Admin

Updated on April 17, 2021

Comments

  • Admin
    Admin about 3 years

    Python code:

    import xml.etree.ElementTree as ET
    root = ET.parse("h.xml")
    print root.findall('saybye')
    

    h.xml code:

    <hello>
      <saybye>
       <saybye>
       </saybye>
      </saybye>
      <saybye>
      </saybye>
    </hello>
    

    Code outputs,

    [<Element 'saybye' at 0x7fdbcbbec690>, <Element 'saybye' at 0x7fdbcbbec790>]
    

    saybye which is a child of another saybye is not selected here. So, how to instruct findall to recursively walk down the DOM tree and collect all three saybye elements?