Get Xpath dynamically using ElementTree getpath()

13,727

Solution 1

Rather than trying to construct a full path from the root, you can evaluate XPath expression on with the entry as the base node:

tree = etree.parse(xmlFileUrl)
nsmap = {'def':'http://www.w3.org/2005/Atom'}
entries_expr = etree.XPath('//def:entry', namespaces=nsmap)
category_expr = etree.XPath('category')
for entry in entries_expr(tree):
    category = category_expr(entry)

If performance is not critical, you can simplify the code by using the .xpath() method on elements rather than pre-compiled expressions:

tree = etree.parse(xmlFileUrl)
nsmap = {'def':'http://www.w3.org/2005/Atom'}
for entry in tree.xpath('//def:entry', namespaces=nsmap):
    category = entry.xpath('category')

Solution 2

Basically, using the standard Python's xml.etree library, a different visit function is needed. To achieve this scope you can build a modified version of iter method like this:

def etree_iter_path(node, tag=None, path='.'):
    if tag == "*":
        tag = None
    if tag is None or node.tag == tag:
        yield node, path
    for child in node:
        _child_path = '%s/%s' % (path, child.tag)
        for child, child_path in etree_iter_path(child, tag, path=_child_path):
            yield child, child_path

Then you can use this function for the iteration of the tree from the root node:

from xml.etree import ElementTree

xmldoc = ElementTree.parse(*path to xml file*)
for elem, path in etree_iter_path(xmldoc.getroot()):
    print(elem, path)

Solution 3

From the docs http://lxml.de/xpathxslt.html#the-xpath-class:

ElementTree objects have a method getpath(element), which returns a structural, absolute XPath expression to find that element:

So the answer to your question is that getpath() will not return a "fully qualified" path, since otherwise there would be an argument to the function for that, you are only guarenteed that the xpath expression returned will find you that element.

You may be able to combine getpath and xpath (and Xpath class) to do what you want though.

Share:
13,727
puntofisso
Author by

puntofisso

Senior Systems Analyst at St. George's University of London, working on Open Data & geomobile, ex-PhD student at Imperial College. Developer of LiveRugbyApp. Choir singer, cheese maker, and rugby player.

Updated on June 04, 2022

Comments

  • puntofisso
    puntofisso about 2 years

    I need to write a dynamic function that finds elements on a subtree of an ATOM xml by building dynamically the XPath to the element.

    To do so, I've written something like this:

        tree = etree.parse(xmlFileUrl)
        e = etree.XPathEvaluator(tree, namespaces={'def':'http://www.w3.org/2005/Atom'})
        entries = e('//def:entry')
        for entry in entries:
            mypath = tree.getpath(entry) + "/category"
            category = e(mypath)
    

    The code above fails to find "category" because getpath() returns an XPath without namespaces, whereas the XPathEvaluator e() requires namespaces.

    Although I know I can use the path and provide a namespace in the call to XPathEvaluator, I would like to know if it's possible to make getpath() return a "fully qualified" path, using all the namespaces, as this is convenient in certain cases.

    (This is a spin-off question of my earlier question: Python XpathEvaluator without namespace)

  • puntofisso
    puntofisso over 11 years
    Hi @jmh, thanks. So if that's true, I hope someone will be able to help with the combination of xpath and getpath. However, I'm surprised that such a function is not provided.
  • jmh
    jmh over 11 years
    There are examples in the documentation.
  • puntofisso
    puntofisso over 11 years
    Hi there, I managed to get this now. Probably building the full path is not really helpful. Btw, doesn't the "namespaces=nsmap" need a comma before it?