How do I match contents of an element in XPath (lxml)?

23,347

I would try with:

.//a[text()='Example']

using xpath() method:

tree.xpath(".//a[text()='Example']")[0].tag

If case you would like to use iterfind(), findall(), find(), findtext(), keep in mind that advanced features like value comparison and functions are not available in ElementPath.

lxml.etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath). As an lxml specific extension, these classes also provide an xpath() method that supports expressions in the complete XPath syntax, as well as custom extension functions.

Share:
23,347
akosch
Author by

akosch

Cowboy coder

Updated on July 09, 2022

Comments

  • akosch
    akosch almost 2 years

    I want to parse HTML with lxml using XPath expressions. My problem is matching for the contents of a tag:

    For example given the

    <a href="http://something">Example</a>
    

    element I can match the href attribute using

    .//a[@href='http://something']
    

    but the given the expression

    .//a[.='Example']
    

    or even

    .//a[contains(.,'Example')]
    

    lxml throws the 'invalid node predicate' exception.

    What am I doing wrong?

    EDIT:

    Example code:

    from lxml import etree
    from cStringIO import StringIO
    
    html = '<a href="http://something">Example</a>'
    parser = etree.HTMLParser()
    tree   = etree.parse(StringIO(html), parser)
    
    print tree.find(".//a[text()='Example']").tag
    

    Expected output is 'a'. I get 'SyntaxError: invalid node predicate'

  • akosch
    akosch about 14 years
    I don't want to find the link based on href, but based on the text it contains: "Example" in the above example :) .//a[@href='something'] works the way it is...
  • Greg
    Greg about 14 years
    you need to remove an = .//a[text()='Example']
  • akosch
    akosch about 14 years
    Thanks for your suggestion, but this one raises "SyntaxError: invalid node predicate" too
  • akosch
    akosch about 14 years
    Thank you: with XPath() it really works. Strangely enough @href works in both cases.
  • SIslam
    SIslam over 8 years
    @systempuntoout Then is .//a[text()='Example'] invalid in this case?