XPath find text in any text node
73,490
This expression //text() = 'Alliance Consulting'
evals to a boolean.
In case of this test sample:
<r>
<t>Alliance Consulting</t>
<s>
<p>Test string
<f>Alliance Consulting</f>
</p>
</s>
<z>
Alliance Consulting
<y>
Other string
</y>
</z>
</r>
It will return true
of course.
Expression you need should evaluate to node-set, so use:
//text()[. = 'Alliance Consulting']
E.g. expression:
count(//text()[normalize-space() = 'Alliance Consulting'])
against the above document will return 3
.
To select text nodes which contain 'Alliance Consulting'
in the whole string value (e.g. 'Alliance Consulting provides great services'
) use:
//text()[contains(.,'Alliance Consulting')]
Do note that adjacent text nodes should become one after parser gets to the document.
Related videos on Youtube
![dagda1](https://i.stack.imgur.com/YVFLg.jpg?s=256&g=1)
Author by
dagda1
Updated on November 27, 2021Comments
-
dagda1 over 2 years
I am trying to find a certain text in any text node in a document, so far my statement looks like this:
doc.xpath("//text() = 'Alliance Consulting'") do |node| ... end
This obviously does not work, can anyone suggest a better alternative?
-
Michael Kay over 13 yearsAre you sure you want to find the text node? I think it's more likely that you really want to find the element containing the text node. I would suggest
//*[. = 'Alliance Consulting']
-
Admin over 13 years@Michael Kay: I agree that it's better not to select text nodes (particulary in mixed content data model like XHTML). But I would use
//*[. = 'Alliance Consulting'][not(* = 'Alliance Consulting')]
to select the inner most elements with such string value. -
jpaugh over 8 yearsYour question might be more valuable if you removed the Ruby code. Not everyone will recognize it, and it doesn't seem relevant to your question.
-
-
geoidesic almost 8 yearsThis seems to return an object which only contains a length property, no nodes. How does one get the parent nodes of the found text?
-
John Churchill almost 7 years@geoidesic this should work: //*[contains(text(), 'Alliance Consulting')]