XSLT Ignore duplicate elements

10,065

Solution 1

Try replacing preceding-sibling with just preceding in your xsl:if statement.

You had the right idea to craft your test so that you only emit a tr once per encountered Description value. However, preceding-sibling stops checking back through immediate children of the parent; preceding continues checking earlier in the document, which is what you want to avoid the duplication across Records.

Fyi, there's also a typo where Record( should be Record[. Here's a complete, operational transformation including those changes:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>
  <xsl:template match="/">
    <table border="1">
      <tr>
        <th>Type</th>
        <th>Count</th>
      </tr>
      <xsl:for-each select="Main/Records/Record">
        <xsl:if test ="not(preceding::Record[Description/text() = current()/Description/text()])">
          <tr>
            <td><xsl:value-of select="Description"/></td>
            <td><xsl:value-of select="count(//Record[Description/text()=current()/Description/text()])"/></td>
          </tr>
        </xsl:if>
      </xsl:for-each>
    </table>
  </xsl:template>
</xsl:stylesheet>

Solution 2

Even though @kjhughes was able to answer the question you have, for a more efficient approach you will want to use the Muenchain Method.

To do that you would define a key to group on

<xsl:key name="Record-by-Description" match="Record" use="Description"/>

Then group those items using that key

<xsl:apply-templates select="Record[generate-id() = generate-id(key('Record-by-Description', Description)[1])]" mode="group"/>

And then use that key to count only those specific items

<xsl:value-of select="count(key('Record-by-Description', Description))"/>

This process is much more efficient and doesn't require navigating the entire structure each time.

So, altogether, when you take this XML

<Main>
  <Records>
    <Record>
      <Description>A</Description>
    </Record>
    <Record>
      <Description>A</Description>
    </Record>
    <Record>
      <Description>B</Description>
    </Record>
    <Record>
      <Description>C</Description>
    </Record>
  </Records>
  <Records>
    <Record>
      <Description>B</Description>
    </Record>
    <Record>
      <Description>A</Description>
    </Record>
    <Record>
      <Description>C</Description>
    </Record>
    <Record>
      <Description>C</Description>
    </Record>
  </Records>
</Main>

And apply this XSLT to it

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
  <xsl:output method="xml" indent="yes" />
  <xsl:key name="Record-by-Description" match="Record" use="Description"/>

  <xsl:template match="@* | node()">
    <xsl:apply-templates select="@* | node()"/>
  </xsl:template>

  <xsl:template match="Main">
    <table>
      <tr>
        <th>Type</th>
        <th>Count</th>
      </tr>
      <xsl:apply-templates select="Records"/>
    </table>
  </xsl:template>

  <xsl:template match="Records">
    <xsl:apply-templates select="Record[generate-id() = generate-id(key('Record-by-Description', Description)[1])]" mode="group"/>
  </xsl:template>

  <xsl:template match="Record" mode="group">
    <tr>
      <td>
        <xsl:value-of select="Description"/>
      </td>
      <td>
        <xsl:value-of select="count(key('Record-by-Description', Description))"/>
      </td>
    </tr>
  </xsl:template>

</xsl:stylesheet>

It produces this output

<table>
  <tr>
    <th>Type</th>
    <th>Count</th>
  </tr>
  <tr>
    <td>A</td>
    <td>3</td>
  </tr>
  <tr>
    <td>B</td>
    <td>2</td>
  </tr>
  <tr>
    <td>C</td>
    <td>3</td>
  </tr>
</table>
Share:
10,065
Diggers
Author by

Diggers

Updated on June 04, 2022

Comments

  • Diggers
    Diggers almost 2 years

    I have just started to learn XSLT and am having trouble ignoring duplicated elements.

    I have been searching through Stack Overflow and have seen people ask similar questions. I tried a small example to see where I was going wrong on my file, and was able to ignore duplicated elements. However the problem for me seems to arise when I have more than one type of an element.

    For Example:

    File1.xml

    <?xml-stylesheet type="text/xsl" href="merge2.xsl"?>
    
    <Main>
        <Records>
            <Record>
                <Description>A</Description>
            </Record>
            <Record>
                <Description>A</Description>
            </Record>
            <Record>
                <Description>B</Description>
            </Record>
            <Record>
                <Description>C</Description>
            </Record>
        </Records>
    </Main>
    

    merge2.xsl

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/Transform">
    <xsl:output method="html"/>
    <xsl:template match="/">
        <table border="1">
            <tr>
                <th>Type</th>
                <th>Count</th>
            </tr>
            <xsl:for-each select="Main/Records/Record">
                <xsl:if test ="not(preceding-sibling::Record(Description/text() = current()/Description/text()])">
                    <tr>
                        <td><xsl:value-of select="Description"/></td>
                        <td><xsl:value-of select="count(//Record[Description/text()=current()/Description/text()])"/></td>
                    </tr>
                </xsl:if>
            </xsl:for-each>
        </table>
    </xsl:template>
    </xsl:stylesheet>
    

    This works fine and gives me the desired results.

    Type    Count
     A        2
     B        1
     C        1
    

    However if I were to add another Records element it seems to process the two one after another e.g.

    <?xml-stylesheet type="text/xsl" href="merge2.xsl"?>
    
    <Main>
        <Records>
            <Record>
                <Description>A</Description>
            </Record>
            <Record>
                <Description>A</Description>
            </Record>
            <Record>
                <Description>B</Description>
            </Record>
            <Record>
                <Description>C</Description>
            </Record>
        </Records>
        <Records>
            <Record>
                <Description>B</Description>
            </Record>
            <Record>
                <Description>A</Description>
            </Record>
            <Record>
                <Description>C</Description>
            </Record>
            <Record>
                <Description>C</Description>
            </Record>
        </Records>
    </Main>
    

    This would produce the following.

    Type        Count
     A            3
     B            2
     C            3
     B            2
     A            3
     C            3
    

    Where it seems to process the first instance of Records, then move onto the next. Is there any way of making it so that it would remove duplicates across the two?

    I have tried changing the for-each to go through each instance of Records, and have tried to create a separate template for it, however I still seem to be missing something as I have not managed to get it working.

    Many thanks for any help.