Difference between text() and string()

54,856

Solution 1

Can someone explain the difference between text() and string() functions.

I. text() isn't a function but a node test.

It is used to select all text-node children of the context node.

So, if the context node is an element named x, then text() selects all text-node children of x.

Other examples:

/a/b/c/text()

selects all text-node children of any c element that is a child of any b element that is a child of the top element a.

II. The string() function

By definition string(exprSelectingASingleNode) returns the string value of the node.

The string value of an element is the concatenation of all of its text-node descendents -- in document order.

Therefore, if in the following XML document:

<a>
  <b>2</b>
  <c>3
    <d>4</d>
  </c>
  5
</a>

string(/a) returns (without the surrounding quotes):

"
  2
  3
    4

  5
"

As we see, the string value reflects three white-space-only text-nodes, which we typically fail to notice and account for.

Some XML parsers have the option of stripping-off white-space-only text nodes. If the above document was parsed with the white-space-only text nodes stripped off, then the same function:

string(/a)

now returns:

"23
    4
  5
"

Solution 2

Most of the time, if you want the content of an element node X, you can refer to it as ".", if it's the context node, or as "X" if it's a child of the context node. For example:

<xsl:if test="X = 'abcd'">...

or

<xsl:value-of select="."/>

In both cases, because the context demands a string, the string() function is applied automatically. (That's a slight simplification, if you're running schema-aware XSLT 2.0 the rules are a little more complicated).

Using "string()" here is unnecessary, because it's done automatically; and using text() is a mistake (one that seems to be increasingly common, encouraged by some bad tutorials on the web). Using ./text() orX/text() in this situation gives you all the text node children of the element. Often the element has one text node child whose string value happens to be the same as the string value of the element, but your code fails if someone adds a comment or processing instruction, because the value is then split into multiple text nodes. It also fails if the element is one (say "title") that allows mixed content: string(title) and title/text() are going to give the same answer until you hit an article with the title

<title>On the wetness of H<sub>2</sub>O</title>
Share:
54,856
Jayy
Author by

Jayy

Interested in Java, Servlets, JSPs, Struts and J2EE frameworks. Orbeon Xforms, Xpath, html and CSS.

Updated on January 18, 2020

Comments

  • Jayy
    Jayy over 4 years

    Can someone explain the difference between text() and string() functions. I often use one with other, but it does not make any difference, both will get the string value of the xml node.

  • user8658912
    user8658912 over 6 years
    I used to believe that using an expression like select="/somenode/text()" would indeed be more precise and less error prone. So you are suggesting the /text() part is unnecessary, even a mistake? Could you please discuss briefly when it is a good idea or useful to use /text() then? Thanks
  • Michael Kay
    Michael Kay over 6 years
    Comments are not designed for asking supplementary questions, please raise a new question.
  • user8658912
    user8658912 over 6 years
    (in other words) Your statement that "string() is applied automatically" is true in several XSLT elements, like xsl:value-of. However, note that the user doesn't mention XSLT at all in his question. So, as you explained, using text() or string() is usually wrong/unnecessary in XSLT, but it does make sense to use string() for instance in XQuery when expecting the string value of a node, otherwise we would get the full node instead, right? Whereas text() would rather be appropiate to specifically browse through text nodes.
  • Michael Kay
    Michael Kay over 6 years
    Yes, there are some contexts (XQuery <a>{x/y/z}</a> is the most notorious example; another is instance of) where there is no implicit atomization and it should therefore be done manually: using string() or data() is nearly always better than using /text().