Get Xpath dynamically using ElementTree getpath()
Solution 1
Rather than trying to construct a full path from the root, you can evaluate XPath expression on with the entry as the base node:
tree = etree.parse(xmlFileUrl)
nsmap = {'def':'http://www.w3.org/2005/Atom'}
entries_expr = etree.XPath('//def:entry', namespaces=nsmap)
category_expr = etree.XPath('category')
for entry in entries_expr(tree):
category = category_expr(entry)
If performance is not critical, you can simplify the code by using the .xpath()
method on elements rather than pre-compiled expressions:
tree = etree.parse(xmlFileUrl)
nsmap = {'def':'http://www.w3.org/2005/Atom'}
for entry in tree.xpath('//def:entry', namespaces=nsmap):
category = entry.xpath('category')
Solution 2
Basically, using the standard Python's xml.etree library, a different visit function is needed. To achieve this scope you can build a modified version of iter method like this:
def etree_iter_path(node, tag=None, path='.'):
if tag == "*":
tag = None
if tag is None or node.tag == tag:
yield node, path
for child in node:
_child_path = '%s/%s' % (path, child.tag)
for child, child_path in etree_iter_path(child, tag, path=_child_path):
yield child, child_path
Then you can use this function for the iteration of the tree from the root node:
from xml.etree import ElementTree
xmldoc = ElementTree.parse(*path to xml file*)
for elem, path in etree_iter_path(xmldoc.getroot()):
print(elem, path)
Solution 3
From the docs http://lxml.de/xpathxslt.html#the-xpath-class:
ElementTree objects have a method
getpath(element)
, which returns a structural, absolute XPath expression to find that element:
So the answer to your question is that getpath()
will not return a "fully qualified" path, since otherwise there would be an argument to the function for that, you are only guarenteed that the xpath expression returned will find you that element.
You may be able to combine getpath and xpath (and Xpath class) to do what you want though.
puntofisso
Senior Systems Analyst at St. George's University of London, working on Open Data & geomobile, ex-PhD student at Imperial College. Developer of LiveRugbyApp. Choir singer, cheese maker, and rugby player.
Updated on June 04, 2022Comments
-
puntofisso about 2 years
I need to write a dynamic function that finds elements on a subtree of an ATOM xml by building dynamically the XPath to the element.
To do so, I've written something like this:
tree = etree.parse(xmlFileUrl) e = etree.XPathEvaluator(tree, namespaces={'def':'http://www.w3.org/2005/Atom'}) entries = e('//def:entry') for entry in entries: mypath = tree.getpath(entry) + "/category" category = e(mypath)
The code above fails to find "category" because getpath() returns an XPath without namespaces, whereas the XPathEvaluator e() requires namespaces.
Although I know I can use the path and provide a namespace in the call to XPathEvaluator, I would like to know if it's possible to make getpath() return a "fully qualified" path, using all the namespaces, as this is convenient in certain cases.
(This is a spin-off question of my earlier question: Python XpathEvaluator without namespace)
-
puntofisso over 11 yearsHi @jmh, thanks. So if that's true, I hope someone will be able to help with the combination of xpath and getpath. However, I'm surprised that such a function is not provided.
-
jmh over 11 yearsThere are examples in the documentation.
-
puntofisso over 11 yearsHi there, I managed to get this now. Probably building the full path is not really helpful. Btw, doesn't the "namespaces=nsmap" need a comma before it?