Get second element text with XPath?

40,800

Solution 1

I tried this but it doesn't work.

t = item.findtext('.//span[@class="python"]//a[2]')

This is a FAQ about the // abbreviation.

.//a[2] means: Select all a descendents of the current node that are the second a child of their parent. So this may select more than one element or no element -- depending on the concrete XML document.

To put it more simply, the [] operator has higher precedence than //.

If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:

(.//a)[2]

This really selects the second a descendent of the current node.

For the actual expression used in the question, change it to:

(.//span[@class="python"]//a)[2]

or change it to:

(.//span[@class="python"]//a)[2]/text()

Solution 2

I'm not sure what the problem is...

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

Solution 3

From Comments:

or the simplification of the actual HTML I posted is too simple

You are right. What is the meaning of .//span[@class="python"]//a[2]? This will be expanded to:

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

It will finaly select the second a child (fn:position() refers to the child axe). So, nothing will be select if your document is like:

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span> 

If you want the second of all descendants, use:

descendant::span[@class="python"]/descendant::a[2]
Share:
40,800

Related videos on Youtube

Admin
Author by

Admin

Updated on March 10, 2020

Comments

  • Admin
    Admin over 4 years
    <span class='python'>
      <a>google</a>
      <a>chrome</a>
    </span>
    

    I want to get chrome and have it working like this already.

    q = item.findall('.//span[@class="python"]//a')
    t = q[1].text # first element = 0
    

    I'd like to combine it into a single XPath expression and just get one item instead of a list.
    I tried this but it doesn't work.

    t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1
    

    And the actual, not simplified, HTML is like this.

    <span class='python'>
      <span>
        <span>
          <img></img>
          <a>google</a>
        </span>
        <a>chrome</a>
      </span>
    </span>
    
    • Ken Bloom
      Ken Bloom over 13 years
      Your expression .//span[@class="python"]//a[2] works for me.
    • Admin
      Admin over 13 years
      Hmmm it seems I have a mistake somewhere, or the simplification of the actual HTML I posted is too simple. I'll try and then modify the question.
    • Dimitre Novatchev
      Dimitre Novatchev over 13 years
      @pdnsk: Good question, +1. See my answer for an explanation and for a simple solution. :)
    • Fractal
      Fractal about 5 years
      so glad you posted this question. Been trying to figure out a similar problem for about a day.
  • Admin
    Admin over 13 years
    It works with xpath but not with findtext, and returns a list with one item.
  • Admin
    Admin over 13 years
    @pdknsk: That's because this XPath expression return a node set result: it could be empty, it could be a singleton, it could be many spans with a "python" class an a second descendant... If you want the string value of the first of this results, use string() function with this expression as argument. I don't know what kind of data type can return your xpath method...
  • Admin
    Admin over 13 years
    It works. I used a combination of the previous answer, with /text(), and this answer, but I'll accept this answer because it details the problem. I only have one question. What is the short equivalent to /descandant::?
  • Admin
    Admin over 13 years
    Thank you for the explanation, but I have one question, or actually two. If there is only one matching element, will [2] throw an exception or return None? And do you know why this works with xpath but not findtext?
  • Dimitre Novatchev
    Dimitre Novatchev over 13 years
    @pdnsk: My answer is pure XPath. I don't know Python.
  • Admin
    Admin over 13 years
    I tried and it just returns no element, which is good because one reason why I wanted to avoid lists and have it in a single expression is to not have an additional check.
  • Admin
    Admin over 13 years
    @pdknsk: First, text() will return all the text node children. string() or the DOM method for string value will return the concatenation of all descendant text nodes. It's not the same. Second, there is no abbreviated form for descendant axe. My last expression it's equivalent to (.//span[@class="python"]//a)[2]? so the position() predicate gets applied to the whole expression not just last step.
  • Fractal
    Fractal about 5 years
    Been trying to figure out a similar answer for a full day. Thanks a ton for the help!