How to retrieve namespaces in XML files using Xpath

50,775

Solution 1

There are a few techniques that you might try; which you use will depend on exactly what information you need to get out of the document, how rigorous you want to be, and how conformant the XPath implementation you're using is.

One way to get the namespace URI associated with a particular prefix is using the namespace:: axis. This will give you a namespace node whose name is the prefix and whose value is the namespace URI. For example, you could get the default namespace URI on the document element using the path:

/*/namespace::*[name()='']

You might be able to use that to set up the namespace associations for your XPathNavigator. Be warned, though, that the namespace:: axis is one of those corners of XPath 1.0 that isn't always implemented.

A second way of getting that namespace URI is to use the namespace-uri() function on the document element (which you've said will always be in that namespace). The expression:

namespace-uri(/*)

will give you that namespace.

An alternative would be to forget about associating a prefix with that namespace, and just make your path namespace-free. You can do this by using the local-name() function whenever you need to refer to an element whose namespace you don't know. For example:

//*[local-name() = 'Element']

You could go one step further and test the namespace URI of the element against the one of the document element, if you really wanted:

//*[local-name() = 'Element' and namespace-uri() = namespace-uri(/*)]

A final option, given that the namespace seems to mean nothing to you, would be to run your XML through a filter that strips out the namespaces. Then you won't have to worry about them in your XPath at all. The easiest way to do that would be simply to remove the xmlns attribute with a regular expression, but you could do something more complex if you needed to do other tidying at the same time.

Solution 2

This 40-line xslt transformation provides all the useful information about the namespaces in a given XML document:

    <xsl:stylesheet version="1.0"
       xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
       xmlns:ext="http://exslt.org/common"
       exclude-result-prefixes="ext"
    >

    <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:strip-space elements="*"/>

    <xsl:key name="kNsByNsUri" match="ns" use="@uri"/>

    <xsl:variable name="vXmlNS" 
        select="'http://www.w3.org/XML/1998/namespace'"/>

    <xsl:template match="/">
      <xsl:variable name="vrtfNamespaces">
        <xsl:for-each select=
          "//namespace::*
                 [not(. = $vXmlNS)
                 and
                  . = namespace-uri(..)
               ]">
          <ns element="{name(..)}"
              prefix="{name()}" uri="{.}"/>
        </xsl:for-each>
      </xsl:variable>

      <xsl:variable name="vNamespaces"
        select="ext:node-set($vrtfNamespaces)/*"/>

      <namespaces>
              <xsl:for-each select=
               "$vNamespaces[generate-id()
                            =
                             generate-id(key('kNsByNsUri',@uri)[1])
                            ]">
                <namespace uri="{@uri}">
                  <xsl:for-each select="key('kNsByNsUri',@uri)/@element">
                    <element name="{.}" prefix="{../@prefix}"/>
                  </xsl:for-each>
                </namespace>
              </xsl:for-each>
      </namespaces>
    </xsl:template>
   </xsl:stylesheet>

When applied on the following XML document:

<a xmlns="my:def1" xmlns:n1="my:n1"
   xmlns:n2="my:n2" xmlns:n3="my:n3">
  <b>
    <n1:d/>
  </b>
  <n1:c>
    <n2:e>
      <f/>
    </n2:e>
  </n1:c>
  <n2:g/>
</a>

the wanted result is produced:

<namespaces>
   <namespace uri="my:def1">
      <element name="a" prefix=""/>
      <element name="b" prefix=""/>
      <element name="f" prefix=""/>
   </namespace>
   <namespace uri="my:n1">
      <element name="n1:d" prefix="n1"/>
      <element name="n1:c" prefix="n1"/>
   </namespace>
   <namespace uri="my:n2">
      <element name="n2:e" prefix="n2"/>
      <element name="n2:g" prefix="n2"/>
   </namespace>
</namespaces>

Solution 3

Unfortunately, XPath doesn't have any concept of "default namespace". You need to register namespaces with prefixes with the XPath context, and then use those prefixes in your XPath expressions. It means for very verbose xpath, but it's a basic shortcoming of XPath 1. Apparently XPath 2 will address this, but that's no use to you right now.

I suggest that you programmatically examine your XML document for the namespace, associate that namespace with a prefix in the XPath context, then use the prefix in the xpath expressions.

Share:
50,775
Luis Filipe
Author by

Luis Filipe

Updated on October 19, 2021

Comments

  • Luis Filipe
    Luis Filipe over 2 years

    I have an XML file that starts like this:

    <Elements name="Entities" xmlns="XS-GenerationToolElements">
    

    I'll have to open a lot of these files. Each of these have a different namespace but will only have one namespace at a time (I'll never find two namespaces defined in one xml file).

    Using XPath I'd like to have an automatic way to add the given namespace to the namespace manager. So far, i could only get the namespace by parsing the xml file but I have a XPathNavigator instance and it should have a nice and clean way to get the namespaces, right?

    -- OR --

    Given that I only have one namespace, somehow make XPath use the only one that is present in the xml, thus avoiding cluttering the code by always appending the namespace.

  • Luis Filipe
    Luis Filipe over 15 years
    It seems it will have to boil down to that..! Thanks
  • AnthonyWJones
    AnthonyWJones over 15 years
    I suspect this is the actualy answer since it seems to be the desire was to avoid the added complexities of querying a namespace in XPath. Don't forget to accept the appropriate answer.
  • Luis Filipe
    Luis Filipe over 15 years
    Thanks you very much for your detailed answer It seems that i have no reputation yet to vote you up
  • Matthew Read
    Matthew Read about 12 years
    The second method works well in Qt with QXmlQuery. Great answer.
  • akostadinov
    akostadinov over 11 years
    Thanks man, now I can use put namespace in a variable and use it to create elements. Handy when namespace URI changes with product versions but changed nodes are reasonably stable. e.g. <xsl:variable name="thisns" select="namespace-uri()"/> then use the variable to set new element namespace.
  • Adam Mackler
    Adam Mackler over 10 years
    FYI: In XPath Version 2.0, the namespace axis is deprecated.