Only select text directly in node, not in child nodes

xpath xquery

26,869

Solution 1

In the provided XML document:

<div id="comment">
      <div class="title">Editor's Description</div>
      <div class="changed">Last updated: </div>
      <br class="clear">
      Lorem ipsum dolor sit amet. 
</div>

the top element /div has 4 children nodes that are text nodes. The first three of these four text-node children are whitespace-only. The last of these 4 text-node children is the one that is wanted.

Use:

/div/text()[last()]

This is different from:

/div/text()

The latter may (depending on whether whitespace-only nodes are preserved by the XML parser) select all 4 text nodes, but you only want the last of them.

An alternative is (when you don't know exactly which text-node you want):

/div/text()[normalize-space()]

This selects all text-node-children of /div that are not whitespace-only text nodes.

Solution 2

Just select text() instead of .:

div/text()

On the given XML fragment, this returns:

Lorem ipsum dolor sit amet.

Solution 3

How about this :
$doc/node()[3]/text()
Assuming $doc has the xml.

26,869

Author by

Moak

Jack of all trades, master of none;

Updated on July 09, 2022

Comments

Moak almost 2 years
How does one retrieve the text in a node without selecting the text in the children?
```
<div id="comment">
     <div class="title">Editor's Description</div>
     <div class="changed">Last updated: </div>
     <br class="clear">
     Lorem ipsum dolor sit amet.
</div>
```
In other words, I want Lorem ipsum dolor sit amet. rather than Editor's DescriptionLast updated: Lorem ipsum dolor sit amet.
Lucero over 13 years

@Dimitre, the question is to select the text without child nodes, the first suggestion by you doesn't do this.
Dimitre Novatchev over 13 years

@Lucero: Why? I haven't suggested the use of the descendant:: axis or the // abbreviation. The first expression selects just one text node: the last child text node of /div. the alternative selects any child text node of /div that is not whitespace-only.
Lucero over 13 years

@Dimitre, simply because nothing says that the wanted text will be the last node?
Dimitre Novatchev over 13 years

@Lucero: I have edited my answer to make it more clear. Hope you understand it now.
Lucero over 13 years

@Dimitre, the question was to get the text without the text of the child nodes. Getting the last text node only is working for the given sample, but not answering the question in general.
Dimitre Novatchev over 13 years

@Lucero: I think that the edited answer meets your objections -- it explains the two alternatives one has: either know exactly which node you want to select, or select all text nodes that are not white-space only. Both expressions avoid selecting whitespace-only text nodes -- something that may happen using your suggested solution. Do note that the OP really wants only non-whitespace-only text nodes.
Moak over 13 years

@Dimitre, in fact the white space stripping was useful as well, thanks to both
István Ujj-Mészáros over 13 years

I just don't get why both of the solutions don't work for me in Firefox with XPather, but //div/text()[normalize-space() and parent::div[@id='comment']] is fine.
Dimitre Novatchev over 13 years

@styu: Then you are evaluating the XPath expressions against a different XML document (not against the provided XML document)
István Ujj-Mészáros over 13 years

@Dimitre I think it's an issue with XPather. Your XPath Visualizer and an other one works fine, thanks.
djangofan about 9 years

This does not solve the answer for me. I need the xpath result to be in the form of a webelement, not a String, and so using /text() is not an option.
Dimitre Novatchev about 9 years

@djangofan, text() selects all text-node children of the current node -- not strings as you believe. As for "webelements", no such thing exists in XPath.
Dimitre Novatchev over 8 years

@SeanDuggan, Yes, XPath is a very elegant and powerful language.