How to extract element-path from XMLType Node?

13,906

Solution 1

You can achieve that with help of XMLTable function from Oracle XML DB XQuery function set:

select * from 
  XMLTable(
    '
     declare function local:path-to-node( $nodes as node()* )  as xs:string* {
       $nodes/string-join(ancestor-or-self::*/name(.), ''/'')
     };
     for $i in $rdoc//name 
       return <ret><name_path>{local:path-to-node($i)}</name_path>{$i}</ret>
    '
    passing 
    XMLParse(content '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>'
    )
    as "rdoc"
    columns 
      name_path  varchar2(4000) path '//ret/name_path',
      name_value varchar2(4000) path '//ret/name'

  )

For me XQuery looks at least more intuitive for XML data manipulation than XSLT.

You can find useful set of XQuery functions here.

Update 1

I suppose that you need totally plain dataset with full data at last stage. This target can be reached by complicated way, constructed step-by-step below, but this variant is very resource-angry. I propose to review final target (selecting some specific records, count number of elements etc.) and after that simplify this solution or totally change it.

Update 2

All steps deleted from this Update except last because @A.B.Cade proposed more elegant solution in comments. This solution provided in Update 3 section below.

Step 1 - Constructing dataset of id's with corresponding query results

Step 2 - Aggregating to single XML row

Step 3 - Finally get full plain dataset by querying constracted XML with XMLTable

with xmlsource as (
  -- only for purpose to write long string only once
  select '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>' xml_string
   from dual   
),
xml_table as ( 
  -- model of xmltable
  select 10 id, xml_string xml_data from xmlsource union all 
  select 20 id, xml_string xml_data from xmlsource union all 
  select 30 id, xml_string xml_data from xmlsource 
) 
select  *
from
  XMLTable(
    '
        for $entry_user in $full_doc/full_list/list_entry/name_info
          return <tuple>
                   <id>{data($entry_user/../@id_value)}</id>
                   <path>{$entry_user/name_path/text()}</path>
                   <name>{$entry_user/name_value/text()}</name>
                  </tuple> 
    '
    passing ( 
      select  
        XMLElement("full_list", 
          XMLAgg(     
            XMLElement("list_entry",
              XMLAttributes(id as "id_value"),
              XMLQuery(
                '
                 declare function local:path-to-node( $nodes as node()* )  as xs:string* {
                   $nodes/string-join(ancestor-or-self::*/name(.), ''/'')
                 };(: function to construct path :) 
                 for $i in $rdoc//name return <name_info><name_path>{local:path-to-node($i)}</name_path><name_value>{$i/text()}</name_value></name_info>
                '
                passing by value XMLParse(content xml_data) as "rdoc"
                returning content
              )
            )
          )
        )        
        from xml_table
    )   
    as "full_doc"      
    columns
      id_val   varchar2(4000) path '//tuple/id',
      path_val varchar2(4000) path '//tuple/path',
      name_val varchar2(4000) path '//tuple/name'
  )    

Update 3

As mentioned by @A.B.Cade in his comment, there are really simple way to join ID's with XQuery results.

Because I don't like external links in answers, code below represents his SQL fiddle, a little bit adapted to the data source from this answer:

with xmlsource as (
  -- only for purpose to write long string only once
  select '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>' xml_string
   from dual   
),
xml_table as ( 
  -- model of xmltable
  select 10 id, xml_string xml_data from xmlsource union all 
  select 20 id, xml_string xml_data from xmlsource union all
  select 30 id, xml_string xml_data from xmlsource
)
select xd.id, x.*  from
xml_table xd,
  XMLTable(
    'declare function local:path-to-node( $nodes as node()* )  as xs:string* {$nodes/string-join(ancestor-or-self::*/name(.), ''/'')     };     for $i in $rdoc//name        return <ret><name_path>{local:path-to-node($i)}</name_path>{$i}</ret>    '
    passing
    XMLParse(content xd.xml_data
    )
    as "rdoc"
    columns
      name_path  varchar2(4000) path '//ret/name_path',
      name_value varchar2(4000) path '//ret/name'

  ) x

Solution 2

This is not perfect but can be a start:

Here is a sqlfiddle

with xslt as (
  select '<?xml version="1.0" ?><xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
  <records>
    <xsl:apply-templates/>
   </records>
  </xsl:template>
  <xsl:template match="//name">
      <columns>
        <path>
        <xsl:for-each select="ancestor-or-self::*">
            <xsl:call-template name="print-step"/>
        </xsl:for-each>
        </path>
  <value>
    <xsl:value-of select="."/>
  </value>
        <xsl:apply-templates select="*"/>
      </columns>
    </xsl:template>
    <xsl:template name="print-step">
        <xsl:text>/</xsl:text>
        <xsl:value-of select="name()"/>
        <xsl:text>[</xsl:text>
        <xsl:value-of select="1+count(preceding-sibling::*)"/>
        <xsl:text>]</xsl:text>
    </xsl:template>
   </xsl:stylesheet>'
  xsl from dual)
, xmldata as
(select xmltransform(xmltype('<?xml version="1.0"?>
    <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
    </users>'), xmltype(xsl)) xd from xslt)

select  XT.* 
from xmldata c,
xmltable('$x//columns' passing c.xd
   as "x"
         columns
         path_c VARCHAR2(4000) PATH 'path',
         value_c VARCHAR2(4000) PATH 'value'
        ) as XT 

This is what I tried to do:

Since you want the "path" I had to use xslt (credits to this post)

Then I used xmltransform for transforming your original xml with the xsl to the desired output (path, value)

Then I used xmltable to read it as a table

Solution 3

This improves on above answer by A.B.Cade:

<xsl:template name="print-step">
    <xsl:variable name="name" select="name()" />
    <xsl:text>/</xsl:text>
    <xsl:value-of select="$name"/>
    <xsl:text>[</xsl:text>
    <xsl:value-of select="1+count(preceding-sibling::*[name()=$name])"/>
    <xsl:text>]</xsl:text>
</xsl:template>

With result:

/users[1]/user[1]/name[1] user1

/users[1]/user[2]/name[1] user2

/users[1]/group[1]/user[1]/name[1] user3

/users[1]/user[3]/name[1] user4

Instead of:

/users[1]/user[1]/name[1] user1

/users[1]/user[2]/name[1] user2

/users[1]/group[3]/user[1]/name[1] user3

/users[1]/user[4]/name[1] user4

Share:
13,906
towi
Author by

towi

Programmer, Algorithmicist, Programming Languagcist, Pythonist, C++icist, Photographer, Boardgamer.

Updated on June 05, 2022

Comments

  • towi
    towi almost 2 years

    I would like to have a select statement on an XML document and one column should return me the path of each node.

    For example, given the data

    SELECT * 
    FROM TABLE(XMLSequence(
      XMLTYPE('<?xml version="1.0"?>
        <users><user><name>user1</name></user>
               <user><name>user2</name></user>
               <group>
                  <user><name>user3</name></user>
               </group>
               <user><name>user4</name></user>
        </users>').extract('/*//*[text()]'))) t;
    

    Which results in

    column_value
    --------
    <user><name>user1</name></user>
    <user><name>user2</name></user>
    <user><name>user3</name></user>
    <user><name>user4</name></user>
    

    I'd like to have a result like this:

    path                     value
    ------------------------ --------------
    /users/user/name         user1
    /users/user/name         user2
    /users/group/user/name   user3
    /users/user/name         user4
    

    I can not see how to get to this. I figure there are two thing that have to work together properly:

    • Can I extract the path from an XMLType with a single operation or method, or do I have to do this with string-magic?
    • What is the correct XPath expression so that I do get the whole element path (if thats possible), eg. <users><group><user><name>user3</name></user></group></user> insead of <user><name>user3</name></user>?

    Maybe I am not understanding XMLType fully, yet. It could be I need a different approach, but I can not see it.

    Sidenotes:

    • In the final version the XML document will be coming from CLOBs of a table, not a static document.
    • The path column can of course also use dots or whatever and the initial slash is not the issue, any representation would do.
    • Also I would not mind if every inner node also gets a result row (possibly with null as value), not only the ones with text() in it (which is what I am really interested in).
    • In the end I will need the tail element of path separate (always "name" in the example here, but this will vary later), i.e. ('/users/groups/user', 'name', 'user3'), I can deal with that separately.
  • towi
    towi about 11 years
    Whoa! I would not have thought of a xslt transformation. I'll have to ponder that. Thanks for the "not totally uncomplicated" stylesheet, anyway. ;-) -- and Xmltable, I will look up, too.
  • towi
    towi about 11 years
    Excellent! With this and some own modifications I am almost there. I now only need to pass the CLOB column of a real table instead of the literal Xml-String into XMLParse(content '...'). Let the table have the name xmldata and the CLOB column be xml. How to put that in there? The result would be a loop-inside-a-loop, in a way. So I can not just write xmldata.xml in place of the '...'-literal...
  • ThinkJet
    ThinkJet about 11 years
    @A.B.Cade Because I not so familiar with joins containing a datasets which built from a current row (even with a multiset feature and nested tables). It seems to be very powerful feature, but used in production systems with long development history, which I work with, very rarely. Thank you for right answer, it will be useful for me too, not only for author of the question.