XPath to locate a cell with specific text parsing HTML tables
Solution 1
Use this XPath:
//td[contains(., 'Chapter')]
Solution 2
You want all td
s under your current node -- not - all in the document as the currently accepted answer selects.
Use:
.//td[.//text()[contains(., 'Chapter')]]
This selects all td
descendants of the current node that are named td
that have at least one text node descendant, whose string value contains the string "Chapter"
.
If it is known in advance that any td
under this table
only has a single text node, this can be simplified to just:
.//td[contains(., 'Chapter')]
Solution 3
Your on the right "path".
The contains() function is limited the a specific element, not text in any of the children. Try this XPath, which you could read as follows:
- get every tr/td with any sub element that contains the text 'Chapter'
tr/td[contains(*,"Chapter")]
Good luck
David Brown
Expert Web Developer Founder Tucanoo Solutions Ltd : https://www.tucanoo.com Grails Development Specialists
Updated on February 26, 2021Comments
-
David Brown about 3 years
Hope someone out there can quickly point me in the right direction with my XPath difficulties.
Current I've got to the point where I'm identifying the correct table i need in my HTML source but then I need to process only the rows that have the text 'Chapter' somewhere in the DOM.
My last attempt was to do this :
// get the correct table HtmlTable table = page.getFirstByXPath("//table[2]"); // now the failing bit.... def rows = table.getByXPath("*/td[contains(text(),'Chapter')]")
I thought the xpath above would represent, get me all elements that have a following child element of 'td' that somewhere in its dom contains the text 'Chapter'
An example of a matching row from my source is :
<tr valign="top"> <td nowrap="" align="Right"> <font face="Verdana"> <a href="index.cfm?a=1">Chapter 1</a> </font> </td> <td class="ChapterT"> <font face="Verdana">DEFINITIONS</font> </td> <td> </td> </tr>
Any help / pointers greatly appreciated.
Thanks,
-
David Brown about 12 yearsHi William, gave it a go but couldn't get it to return anything. What has worked, although doesn't seem the most efficient is a single liner of ' def chapterAnchors = page.anchors.findAll {HtmlAnchor a -> a.asText().contains('Chapter')} '
-
David Brown about 12 yearsThanks, that appears to work. What does the '.' represent? Also I don't understand why the 'reletive' detection isn't working, e.g. you have the // which as I understand means begin at the root?
-
Kirill Polishchuk about 12 years@Dave, You're welcome.
.
and//
is XPath abbreviated syntax..
selects the context node.//td
selects all thetd
descendants of the document root and thus selects alltd
elements in the same document as the context node. Reference: w3.org/TR/xpath/#path-abbrev