How to match a text node then follow parent nodes using XPath

19,171

Solution 1

Do you want that?

//title[text()='Text 1']/../content/text()

Solution 2

Use:

string(/*/*/title[. = 'Text 1']/following-sibling::content)

This represents at least two improvements as compared to the currently accepted solution of Johannes Weiß:

  1. The very expensive abbreviation "//" (usually causing the whole XML document to be scanned) is avoided as it should be whenever the structure of the XML document is known in advance.

  2. There is no return back to the parent (the location step "/.." is avoided)

Share:
19,171
Mat
Author by

Mat

Updated on June 26, 2022

Comments

  • Mat
    Mat about 2 years

    I'm trying to parse some HTML with XPath. Following the simplified XML example below, I want to match the string 'Text 1', then grab the contents of the relevant content node.

    <doc>
        <block>
            <title>Text 1</title>
            <content>Stuff I want</content>
        </block>
    
        <block>
            <title>Text 2</title>
            <content>Stuff I don't want</content>
        </block>
    </doc>
    

    My Python code throws a wobbly:

    >>> from lxml import etree
    >>>
    >>> tree = etree.XML("<doc><block><title>Text 1</title><content>Stuff 
    I want</content></block><block><title>Text 2</title><content>Stuff I d
    on't want</content></block></doc>")
    >>>
    >>> # get all titles
    ... tree.xpath('//title/text()')
    ['Text 1', 'Text 2']
    >>>
    >>> # match 'Text 1'
    ... tree.xpath('//title/text()="Text 1"')
    True
    >>>
    >>> # Follow parent from selected nodes
    ... tree.xpath('//title/text()/../..//text()')
    ['Text 1', 'Stuff I want', 'Text 2', "Stuff I don't want"]
    >>>
    >>> # Follow parent from selected node
    ... tree.xpath('//title/text()="Text 1"/../..//text()')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "lxml.etree.pyx", line 1330, in lxml.etree._Element.xpath (src/
    lxml/lxml.etree.c:14542)
      File "xpath.pxi", line 287, in lxml.etree.XPathElementEvaluator.__ca
    ll__ (src/lxml/lxml.etree.c:90093)
      File "xpath.pxi", line 209, in lxml.etree._XPathEvaluatorBase._handl
    e_result (src/lxml/lxml.etree.c:89446)
      File "xpath.pxi", line 194, in lxml.etree._XPathEvaluatorBase._raise
    _eval_error (src/lxml/lxml.etree.c:89281)
    lxml.etree.XPathEvalError: Invalid type
    

    Is this possible in XPath? Do I need to express what I want to do in a different way?