XQuery/XPath: Using count() and max() function for return of element with highest count

22,813

Solution 1

This may help:

declare default element namespace 'books';
(for $name in distinct-values($doc/books/*/*/name)
 let $entries := $doc/books/*[data(*/name) = $name]
 order by count($entries) descending
 return $entries/*/name)[1]

Solution 2

Here is a pure XPath 2.0 expression, admittedly not for the faint-hearted:

(for $m in max(for $n in distinct-values(/*/b:book/(b:author | b:editor)
                                        /b:name/concat(b:fname, '|', b:lname)),
               $cnt in count(/*/b:book/(b:author | b:editor)
                             /b:name[$n eq concat(b:fname, '|', b:lname) ])
               return $cnt
               ),
     $name in /*/b:book/(b:author | b:editor)/b:name,
     $fullName in $name/concat(b:fname, '|',  b:lname),
     $count in count( /*/b:book/(b:author | b:editor)
                   /b:name[$fullName eq concat(b:fname, '|',  b:lname)])
  return
     if($count eq $m)
       then $name
       else ()
   )[1]

where the prefix "b:" is associated with the namespace "books".

XSLT 2.0 - based verification:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:b="books">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
   <xsl:sequence select=
   "(for $m in max(for $n in distinct-values(/*/b:book/(b:author | b:editor)
                                            /b:name/concat(b:fname, '|', b:lname)),
                   $cnt in count(/*/b:book/(b:author | b:editor)
                                 /b:name[$n eq concat(b:fname, '|', b:lname) ])
                   return $cnt
                   ),
         $name in /*/b:book/(b:author | b:editor)/b:name,
         $fullName in $name/concat(b:fname, '|',  b:lname),
         $count in count( /*/b:book/(b:author | b:editor)
                       /b:name[$fullName eq concat(b:fname, '|',  b:lname)])
      return
         if($count eq $m)
           then $name
           else ()
       )[1]
   "/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<books xmlns="books">
    <book ISBN="i0321165810" publishername="OReilly">
        <title>XPath</title>
        <author>
            <name>
                <fname>Priscilla</fname>
                <lname>Walmsley</lname>
            </name>
        </author>
        <year>2007</year>
        <field>Databases</field>
    </book>
    <book ISBN="i0321165812" publishername="OReilly">
        <title>XQuery</title>
        <author>
            <name>
                <fname>Priscilla</fname>
                <lname>Walmsley</lname>
            </name>
        </author>
        <editor>
            <name>
                <fname>Lisa</fname>
                <lname>Williams</lname>
            </name>
        </editor>
        <year>2003</year>
        <field>Databases</field>
    </book>
    <publisher publishername="OReilly">
        <web-site>www.oreilly.com</web-site>
        <address>
            <street_address>hill park</street_address>
            <zip>90210</zip>
            <state>california</state>
        </address>
        <phone>400400400</phone>
        <e-mail>[email protected]</e-mail>
        <contact>
            <field>Databases</field>
            <name>
                <fname>Anna</fname>
                <lname>Smith</lname>
            </name>
        </contact>
    </publisher>
</books>

the wanted, correct name element is selected and output:

<name xmlns="books">
   <fname>Priscilla</fname>
   <lname>Walmsley</lname>
</name>

Solution 3

I've always felt this was an omission in XPath: the max() and min() functions return the highest/lowest value, whereas what you usually want is the object(s) in a collection that have the highest/lowest value for some expression. One solution is to sort the objects on that value and take the first/last from the list, which seems inelegant. Computing the min/max and then selecting the items whose value matches this seems equally unappealing. In Saxon there has long been a pair of higher-order extension functions saxon:highest() and saxon:lowest() which take a sequence and a function, and return the item(s) from the sequence having the lowest or highest values of the function result. The good news is that in XPath 3.0 you can write these functions yourself (in fact, they are given as example user-written functions in the spec).

Solution 4

You are on the right track. The simplest way is to convert the names into strings (separated with a space, for example) and use these: (Note that the following code is untested)

let $names := (//editor | //author)/concat(fname, ' ', lname)
let $distinct-names := distinct-values($names)
let $name-count := for $name in $distinct-names return count($names[. = $name])
for $name at $pos in $distinct-names
where $name-count[$pos] = max($name-count)
return $name

Or, another approach:

(
  let $people := (//editor | //author)
  for $person in $people
  order by count($people[fname = $person/fname and
                         lname = $person/lname])
  return $person
)[last()]
Share:
22,813
Jea
Author by

Jea

Updated on November 13, 2020

Comments

  • Jea
    Jea over 3 years

    I have an XML file that contains authors and editors.

    <?xml version="1.0" encoding="UTF-8"?>
    <?oxygen RNGSchema="file:textbook.rnc" type="compact"?>
    <books xmlns="books">
    
        <book ISBN="i0321165810" publishername="OReilly">
            <title>XPath</title>
            <author>
                <name>
                    <fname>Priscilla</fname>
                    <lname>Walmsley</lname>
                </name>
            </author>
            <year>2007</year>
            <field>Databases</field>
        </book>
    
        <book ISBN="i0321165812" publishername="OReilly">
            <title>XQuery</title>
            <author>
               <name>
                   <fname>Priscilla</fname>
                   <lname>Walmsley</lname>
                </name>
            </author>
            <editor>
                <name>
                    <fname>Lisa</fname>
                    <lname>Williams</lname>
                </name>
            </editor>
            <year>2003</year>
            <field>Databases</field>
        </book>
    
        <publisher publishername="OReilly">
            <web-site>www.oreilly.com</web-site>
            <address>
                <street_address>hill park</street_address>
                <zip>90210</zip>
                <state>california</state>
            </address>
            <phone>400400400</phone>
            <e-mail>[email protected]</e-mail>
            <contact>
                <field>Databases</field>
                <name>
                    <fname>Anna</fname>
                    <lname>Smith</lname>
                </name>
            </contact>
        </publisher>
    </books>
    

    I'm looking for a way to return the person who has been listed the most times as an author and/or editor. The solution should be XQuery 1.0 (XPath 2.0) compatible.

    I was thinking about using a FLWOR query to iterate through all authors and editors, then doing a count of unique authors/editors, then returning the author(s)/editor(s) that match the highest count. But I haven't been able to find the proper solution.

    Does anyone have any suggestion as to how such a FLWOR query would be written? Could this be done in a simpler way, using XPath?

  • Dimitre Novatchev
    Dimitre Novatchev over 12 years
    @_Oliver: Sorry, but even in XQuery 3.0 / XPath 3.0 this is in error. Hint: look at: $names/count(index-of($names,.) . $names happens to be a sequence of atomic values, but the / operator requires a node(-set) as its left operand.
  • Dimitre Novatchev
    Dimitre Novatchev over 12 years
    @_Oliver: your first approach also doesn't produce any results. Checked with Saxon 9.3.05 under oXygen.
  • Jea
    Jea over 12 years
    Thanks for the solution, Christian :) Is there a way to return more than one author/editor (if applicable)? For instance if there are two authors/editors that share the same (maximum) count as author/editor?
  • Oliver Hallam
    Oliver Hallam over 12 years
    @Dimitre: Good point re '/'. I have removed the XPath example. It was a horrible solution anyway.
  • Dimitre Novatchev
    Dimitre Novatchev over 12 years
    @Jea: Both in Christian's and in my solution just remove the ending [1] and you'll get all the nodes that have the maximum value.
  • grtjn
    grtjn over 12 years
    A link to those examples would be nice!