XPath find text in any text node

73,490

This expression //text() = 'Alliance Consulting' evals to a boolean.

In case of this test sample:

<r>
    <t>Alliance Consulting</t>
    <s>
        <p>Test string
            <f>Alliance Consulting</f>
        </p>
    </s>
    <z>
        Alliance Consulting
        <y>
            Other string
        </y>
    </z>
</r>

It will return true of course.

Expression you need should evaluate to node-set, so use:

//text()[. = 'Alliance Consulting']

E.g. expression:

count(//text()[normalize-space() = 'Alliance Consulting'])

against the above document will return 3.

To select text nodes which contain 'Alliance Consulting' in the whole string value (e.g. 'Alliance Consulting provides great services') use:

//text()[contains(.,'Alliance Consulting')]

Do note that adjacent text nodes should become one after parser gets to the document.

Share:
73,490

Related videos on Youtube

dagda1
Author by

dagda1

Updated on November 27, 2021

Comments

  • dagda1
    dagda1 over 2 years

    I am trying to find a certain text in any text node in a document, so far my statement looks like this:

    doc.xpath("//text() = 'Alliance Consulting'") do |node|
      ...
    end
    

    This obviously does not work, can anyone suggest a better alternative?

    • Michael Kay
      Michael Kay over 13 years
      Are you sure you want to find the text node? I think it's more likely that you really want to find the element containing the text node. I would suggest //*[. = 'Alliance Consulting']
    • Admin
      Admin over 13 years
      @Michael Kay: I agree that it's better not to select text nodes (particulary in mixed content data model like XHTML). But I would use //*[. = 'Alliance Consulting'][not(* = 'Alliance Consulting')] to select the inner most elements with such string value.
    • jpaugh
      jpaugh over 8 years
      Your question might be more valuable if you removed the Ruby code. Not everyone will recognize it, and it doesn't seem relevant to your question.
  • geoidesic
    geoidesic almost 8 years
    This seems to return an object which only contains a length property, no nodes. How does one get the parent nodes of the found text?
  • John Churchill
    John Churchill almost 7 years
    @geoidesic this should work: //*[contains(text(), 'Alliance Consulting')]