Using GROUP BY, COUNT and SAMPLE in apache jena SPARQL

20,407

It's much easier to answer these kinds of questions if a minimal working example is provided (e.g., a complete RDF dataset that we can query over). For instance, in the above, since we don't know the XML base of the document, we can't know whether the individual described by <Group rdf:ID="group_actinoid">...</Group> will actually match the pattern ?group rdf:type pt:Group.

Here's some data based on yours, but which contains another group so that we can see the grouping and aggregation:

@prefix pt: <http://www.daml.org/2003/01/periodictable/PeriodicTable#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

pt:actinoid
  a pt:Group ;
  pt:name "Actinoid" ;
  pt:element pt:Ac ;
  pt:element pt:Th ;
  pt:element pt:Pa ;
  pt:element pt:U ;
  pt:element pt:Np ;
  pt:element pt:Pu ;
  pt:element pt:Am ;
  pt:element pt:Cm ;
  pt:element pt:Bk ;
  pt:element pt:Cf ;
  pt:element pt:Es ;
  pt:element pt:Fm ;
  pt:element pt:Md ;
  pt:element pt:No .

pt:beatles
  a pt:Group ;
  pt:name "Beatles" ;
  pt:element pt:John ;
  pt:element pt:Paul ;
  pt:element pt:George ;
  pt:element pt:Ringo .

Here's a SPARQL query that is very similar to yours (although I used some of the shorter forms where possible), and corrected the swapped ?element pt:element ?group to ?group pt:element ?element. With this SPARQL query, we get the kinds of results that it sounds like you're looking for.

PREFIX pt:<http://www.daml.org/2003/01/periodictable/PeriodicTable#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
SELECT (SAMPLE(?name) AS ?NAME) (COUNT(?element) as ?NELEMENTS)
WHERE {
  ?group a pt:Group ;
         pt:name ?name ;
         pt:element ?element .
}
GROUP BY ?group 

Calling the data groups.n3 and the query groups.sparql, here are the results produced by Apache Jena's command line ARQ:

$ /usr/local/lib/apache-jena-2.10.0/bin/arq  --data groups.n3 --query groups.sparql
--------------------------
| NAME       | NELEMENTS |
==========================
| "Beatles"  | 4         |
| "Actinoid" | 14        |
--------------------------

When I run the same query on the data at http://www.daml.org/2003/01/periodictable/PeriodicTable.owl (after downloading and saving as PeriodicTable.owl), I get the names and counts shown in the following:

$ /usr/local/lib/apache-jena-2.10.0/bin/arq \
      --data ~/Downloads/PeriodicTable.owl \
      --query groups.sparql
--------------------------------------------------
| NAME                               | NELEMENTS |
==================================================
| "Lanthanoid"^^xsd:string           | 14        |
| "Noble gas"^^xsd:string            | 7         |
| "Halogen"^^xsd:string              | 6         |
| "Actinoid"^^xsd:string             | 14        |
| "Chalcogen"^^xsd:string            | 6         |
| "Pnictogen"^^xsd:string            | 6         |
| "Coinage metal"^^xsd:string        | 4         |
| "Alkali metal"^^xsd:string         | 7         |
| "Alkaline earth metal"^^xsd:string | 6         |
--------------------------------------------------
Share:
20,407
Admin
Author by

Admin

Updated on July 14, 2022

Comments

  • Admin
    Admin almost 2 years

    So I have an RDF schema that contains many "groups", and each of these groups has a "name", and contains a number of "elements". I need to select the name of every group, along with the number of elements for each. Here is a sample of a group RDF schema...

    <Group rdf:ID="group_actinoid">
        <name rdf:datatype="&xsd;string">Actinoid</name>
        <element rdf:resource="#Ac"/>
        <element rdf:resource="#Th"/>
        <element rdf:resource="#Pa"/>
        <element rdf:resource="#U"/>
        <element rdf:resource="#Np"/>
        <element rdf:resource="#Pu"/>
        <element rdf:resource="#Am"/>
        <element rdf:resource="#Cm"/>
        <element rdf:resource="#Bk"/>
        <element rdf:resource="#Cf"/>
        <element rdf:resource="#Es"/>
        <element rdf:resource="#Fm"/>
        <element rdf:resource="#Md"/>
        <element rdf:resource="#No"/>
    </Group>
    

    ...and here's the query I've been trying to get to work...

      1 PREFIX pt:<http://www.daml.org/2003/01/periodictable/PeriodicTable#>
      2 PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
      3 PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      4 PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
      5
      6 SELECT (SAMPLE(?name) AS ?NAME) (COUNT(?elem) AS ?ELEMENTCOUNT)
      7 WHERE {
      8         ?group rdf:type pt:Group .
      9         ?group pt:name ?name .
     10         ?elem pt:element ?group .
     11       }
     12 GROUP BY ?group
    

    ...but I'm getting an empty result and I'm not really sure why. I should be getting a group name, along with however many elements that group contains, for every group in the owl file.

  • Admin
    Admin about 11 years
    Hm ok, I'm wondering why my query is returning and empty set, then. Here's the URL to the complete data set daml.org/2003/01/periodictable/PeriodicTable.owl
  • Joshua Taylor
    Joshua Taylor about 11 years
    @MassStrike I pulled down that dataset and included the result from my query. I think it's what would be expected.