Get second element text with XPath?
Solution 1
I tried this but it doesn't work.
t = item.findtext('.//span[@class="python"]//a[2]')
This is a FAQ about the //
abbreviation.
.//a[2]
means: Select all a
descendents of the current node that are the second a
child of their parent. So this may select more than one element or no element -- depending on the concrete XML document.
To put it more simply, the []
operator has higher precedence than //
.
If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:
(.//a)[2]
This really selects the second a
descendent of the current node.
For the actual expression used in the question, change it to:
(.//span[@class="python"]//a)[2]
or change it to:
(.//span[@class="python"]//a)[2]/text()
Solution 2
I'm not sure what the problem is...
>>> d = """<span class='python'>
... <a>google</a>
... <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>
Solution 3
From Comments:
or the simplification of the actual HTML I posted is too simple
You are right. What is the meaning of .//span[@class="python"]//a[2]
? This will be expanded to:
self::node()
/descendant-or-self::node()
/child::span[attribute::class="python"]
/descendant-or-self::node()
/child::a[position()=2]
It will finaly select the second a
child (fn:position()
refers to the child
axe). So, nothing will be select if your document is like:
<span class='python'>
<span>
<span>
<img></img>
<a>google</a><!-- This is the first "a" child of its parent -->
</span>
<a>chrome</a><!-- This is also the first "a" child of its parent -->
</span>
</span>
If you want the second of all descendants, use:
descendant::span[@class="python"]/descendant::a[2]
Related videos on Youtube
![Admin](/assets/logo_square_200-5d0d61d6853298bd2a4fe063103715b4daf2819fc21225efa21dfb93e61952ea.png)
Admin
Updated on March 10, 2020Comments
-
Admin over 4 years
<span class='python'> <a>google</a> <a>chrome</a> </span>
I want to get
chrome
and have it working like this already.q = item.findall('.//span[@class="python"]//a') t = q[1].text # first element = 0
I'd like to combine it into a single XPath expression and just get one item instead of a list.
I tried this but it doesn't work.t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1
And the actual, not simplified, HTML is like this.
<span class='python'> <span> <span> <img></img> <a>google</a> </span> <a>chrome</a> </span> </span>
-
Ken Bloom over 13 yearsYour expression
.//span[@class="python"]//a[2]
works for me. -
Admin over 13 yearsHmmm it seems I have a mistake somewhere, or the simplification of the actual HTML I posted is too simple. I'll try and then modify the question.
-
Dimitre Novatchev over 13 years@pdnsk: Good question, +1. See my answer for an explanation and for a simple solution. :)
-
Fractal about 5 yearsso glad you posted this question. Been trying to figure out a similar problem for about a day.
-
-
Admin over 13 yearsIt works with
xpath
but not withfindtext
, and returns a list with one item. -
Admin over 13 years@pdknsk: That's because this XPath expression return a node set result: it could be empty, it could be a singleton, it could be many spans with a "python" class an a second descendant... If you want the string value of the first of this results, use
string()
function with this expression as argument. I don't know what kind of data type can return yourxpath
method... -
Admin over 13 yearsIt works. I used a combination of the previous answer, with
/text()
, and this answer, but I'll accept this answer because it details the problem. I only have one question. What is the short equivalent to/descandant::
? -
Admin over 13 yearsThank you for the explanation, but I have one question, or actually two. If there is only one matching element, will
[2]
throw an exception or returnNone
? And do you know why this works withxpath
but notfindtext
? -
Dimitre Novatchev over 13 years@pdnsk: My answer is pure XPath. I don't know Python.
-
Admin over 13 yearsI tried and it just returns no element, which is good because one reason why I wanted to avoid lists and have it in a single expression is to not have an additional check.
-
Admin over 13 years@pdknsk: First,
text()
will return all the text node children.string()
or the DOM method for string value will return the concatenation of all descendant text nodes. It's not the same. Second, there is no abbreviated form fordescendant
axe. My last expression it's equivalent to(.//span[@class="python"]//a)[2]?
so theposition()
predicate gets applied to the whole expression not just last step. -
Fractal about 5 yearsBeen trying to figure out a similar answer for a full day. Thanks a ton for the help!