(xslt 1.0) How to replace the space with some string from all the text values in xml?

12,416

Solution 1

As per the wish of Roland, here is a tail-recursive solution:

 <xsl:template name="replace">
  <xsl:param name="ptext"/>
  <xsl:param name="ppattern"/>
  <xsl:param name="preplacement"/>

  <xsl:choose>
     <xsl:when test="not(contains($ptext, $ppattern))">
      <xsl:value-of select="$ptext"/>
     </xsl:when>
     <xsl:otherwise>
       <xsl:value-of select="substring-before($ptext, $ppattern)"/>
       <xsl:value-of select="$preplacement"/>
       <xsl:call-template name="replace">
         <xsl:with-param name="ptext"
           select="substring-after($ptext, $ppattern)"/>
         <xsl:with-param name="ppattern" select="$ppattern"/>
         <xsl:with-param name="preplacement" select="$preplacement"/>
       </xsl:call-template>
     </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

Note that the recursive call is the last instruction in the template -- this is what makes it tail-recursive. The property of being tail-recursive allows a smart XSLT processor (such as Saxon or .NET XslCompiledTransform) to optimize the code, replacing the recursion with simple iteration.

Such code will not end up with a stack-overflow exception even when the "nesting" of calls is millions, whereas non-tail-recursive (and recursive) code typically raises this stack-overflow at a depth of about 1000 nested calls (this really depends on the amount of the available memory).

What if the XSLT processor is not "smart enough"? Is there another technique to avoid deep-level recursive calls stack overflow, that works with every XSLT processor?

Ask me in a separate question and I might tell you :)

Solution 2

Check out the XPath translate function: http://www.w3.org/TR/xpath/#function-translate

<xsl:template match="text()">
    <xsl:value-of select="translate(., ' ', '$')"/>
</xsl:template>

If it's not a single character, but a string you have to replace, it takes considerably more effort, and you need a template to recursively replace the string:

<xsl:template match="text()[not(../*)]">
    <xsl:call-template name="replace">
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="search" select="' '"/>
        <xsl:with-param name="replace" select="'%20'"/>
    </xsl:call-template>
</xsl:template>

<xsl:template name="replace">
    <xsl:param name="text"/>
    <xsl:param name="search"/>
    <xsl:param name="replace"/>
    <xsl:choose>
        <xsl:when test="contains($text, $search)">
            <xsl:variable name="replace-next">
                <xsl:call-template name="replace">
                    <xsl:with-param name="text" select="substring-after($text, $search)"/>
                    <xsl:with-param name="search" select="$search"/>
                    <xsl:with-param name="replace" select="$replace"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:value-of 
                select="
                    concat(
                        substring-before($text, $search)
                    ,   $replace
                    ,   $replace-next
                    )
                "
            />
        </xsl:when>
        <xsl:otherwise><xsl:value-of select="$text"/></xsl:otherwise>
    </xsl:choose>
</xsl:template>

Edit:changed match="text()" to match="text()[not(../*)]", so that the input xml need not be a kind of "pretty print XML" .. (so as to remove unwanted replacements of space with "%20" string in such xml file)

Solution 3

The solution to the "prety-printed xml" is not really a solution.

Imagine having a document like this:

<a>
 <b>
  <c>O M G</c>
  <d>D I Y</d>
 </b>
</a>

The output from the currently accepted solution (after wrapping it in an <xsl:stylesheet> and adding the identity rule is:

<a>
%20<b>
%20%20<c>O$M$G</c>
%20%20<d>D$I$Y</d>
%20</b>
</a>

Now, why doesn't the proposed workaround save the situation? As we see from the above example, an element can have more than one child element that has text nodes...

What is the real solution?

The creators of XSLT have thought about this problem. Using the right terminology, we want all insignificant white-space-only text nodes to be ignored by the XSLT processor, as if they were not part of the document tree at all. This is achieved by the <xsl:strip-space> instruction.

Just add this at a global level (as a child of <xsl:stylesheet> and, for readability, before any templates):

 <xsl:strip-space elements="*"/>

and now you really have a working solution.

Share:
12,416
InfantPro'Aravind'
Author by

InfantPro'Aravind'

An Infant Pro 'Aravind' Siebel, HTML, CSS, JavaScript, DHTML, XML, XPath, XSD, XSLT, VBScript, VB, C#, Core Java, RegEx, SQL and so on.. interest lies in: Languages, culture, Anime-Manga, music-melody and art :) Hobbies: Singing and Pencil sketching/shade art.. One of them is already demonstrated in my profile picture :)

Updated on June 07, 2022

Comments

  • InfantPro'Aravind'
    InfantPro'Aravind' almost 2 years

    EDIT: [it started with character replacement and I ended up with discovering string replacements with help of Dimitre Novatchev and Roland Bouman

    I think the sample codes are sufficient to explain the requirements ..

    This is the sample XML:

    <root>
      <node1>text node</node1>
      <node2>space between the text</node2>
      <node3> has to be replaced with $</node3>
    </root>
    

    This is the Output I am expecting:

    <root>
      <node1>text$node</node1>
      <node2>space$between$the$text</node2>
      <node3>$has$to$be$replaced$with$$</node3>
    </root>
    

    I have tried writing an XSLT code which isn't showing the required output ..
    This is the code:

        <xsl:template match="@* | node()">
            <xsl:copy>
                <xsl:apply-templates select="@* | node()"/>
            </xsl:copy>
        </xsl:template>
      <xsl:template match="text()[.!='']">
        <xsl:call-template name="rep_space">
          <xsl:with-param name="text" select="."/>
        </xsl:call-template>
      </xsl:template>
      <xsl:template name="rep_space">
        <xsl:param name="text"/>
        <xsl:variable name="temp" select="'&#x36;'"/> 
        <xsl:choose>
          <xsl:when test="contains(text,'&#x32;')">
            <xsl:call-template name="rep_space">
              <xsl:with-param name="text" select="concat((concat(substring-before(text,' '),temp)),substring-after(text,' '))"/>
            </xsl:call-template>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="text"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:template>
    

    translate(., ' ', '$')function works .. but not to the satisfactory extent .. my questions are .. what if it is a string instead of character? I mean, suppose I am intended to replace ' ' with "%20"? And one more case, What if the input XML isn't "Pretty Print XML", then all the space appearing in XML are replaced with '$' ..

    Pretty print XML is the file which has proper indent, (Usually my input XMLs never have this) for example:

    one more node this is @ lower level

    You can observe, there are no "space characters" before <new> <test> nodes but they are actually properly indented, (With altova XMLSPY we can give a simple command in edit menu .. to make any XML files to "pretty print XML") ..

    Where as in the below example ..

    <new>
      <test>one more node</test>
       <test2>
        <child>this is @ lower level</child>
       </test2>
    </new>
    

    There are space chars before all the start tags .. <child> tag has more spaces before it than <test2> node ..

    With the second sample xml .. all the space chars are replaced by "%20".. hence the output will be ..

    <new>
    %20%20<test>one%20more%20node</test>
    %20%20<test2>
    %20%20%20%20<child>this%20is%20@%20lower%20level</child>
    %20%20</test2>
    </new>
    

    certainly it is not expected ..

    The solutions posted by Dimitre Novatchev and Roland Bouman can also replace a string by another string, by modifying the parameters passed to the template being called.

    That was great learning @Dimitre, @Roland, I am really thankful and grateful to you guys ..

    regards,
    infant pro.