How to match a text node then follow parent nodes using XPath
19,171
Solution 1
Do you want that?
//title[text()='Text 1']/../content/text()
Solution 2
Use:
string(/*/*/title[. = 'Text 1']/following-sibling::content)
This represents at least two improvements as compared to the currently accepted solution of Johannes Weiß:
The very expensive abbreviation "//" (usually causing the whole XML document to be scanned) is avoided as it should be whenever the structure of the XML document is known in advance.
There is no return back to the parent (the location step "/.." is avoided)
Author by
Mat
Updated on June 26, 2022Comments
-
Mat about 2 years
I'm trying to parse some HTML with XPath. Following the simplified XML example below, I want to match the string 'Text 1', then grab the contents of the relevant
content
node.<doc> <block> <title>Text 1</title> <content>Stuff I want</content> </block> <block> <title>Text 2</title> <content>Stuff I don't want</content> </block> </doc>
My Python code throws a wobbly:
>>> from lxml import etree >>> >>> tree = etree.XML("<doc><block><title>Text 1</title><content>Stuff I want</content></block><block><title>Text 2</title><content>Stuff I d on't want</content></block></doc>") >>> >>> # get all titles ... tree.xpath('//title/text()') ['Text 1', 'Text 2'] >>> >>> # match 'Text 1' ... tree.xpath('//title/text()="Text 1"') True >>> >>> # Follow parent from selected nodes ... tree.xpath('//title/text()/../..//text()') ['Text 1', 'Stuff I want', 'Text 2', "Stuff I don't want"] >>> >>> # Follow parent from selected node ... tree.xpath('//title/text()="Text 1"/../..//text()') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 1330, in lxml.etree._Element.xpath (src/ lxml/lxml.etree.c:14542) File "xpath.pxi", line 287, in lxml.etree.XPathElementEvaluator.__ca ll__ (src/lxml/lxml.etree.c:90093) File "xpath.pxi", line 209, in lxml.etree._XPathEvaluatorBase._handl e_result (src/lxml/lxml.etree.c:89446) File "xpath.pxi", line 194, in lxml.etree._XPathEvaluatorBase._raise _eval_error (src/lxml/lxml.etree.c:89281) lxml.etree.XPathEvalError: Invalid type
Is this possible in XPath? Do I need to express what I want to do in a different way?