How to query XML using namespaces in Java with XPath?

96,486

Solution 1

In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.

The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.

However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.

You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri(). For example:

/*[local-name()='workbook'
    and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
  /*[local-name()='sheets'
      and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
  /*[local-name()='sheet'
      and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]

As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).

You could also just match on the local-name() of the element and ignore the namespace. For example:

/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]

However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name(), your XPath could match on the wrong elements and select the wrong content:

Solution 2

Your problem is the default namespace. Check out this article for how to deal with namespaces in your XPath: http://www.edankert.com/defaultnamespaces.html

One of the conclusions they draw is:

So, to be able to use XPath expressions on XML content defined in a (default) namespace, we need to specify a namespace prefix mapping

Note that this doesn't mean that you have to change your source document in any way (though you're free to put the namespace prefixes in there if you so desire). Sounds strange, right? What you will do is create a namespace prefix mapping in your java code and use said prefix in your XPath expression. Here, we'll create a mapping from spreadsheet to your default namespace.

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();

// there's no default implementation for NamespaceContext...seems kind of silly, no?
xpath.setNamespaceContext(new NamespaceContext() {
    public String getNamespaceURI(String prefix) {
        if (prefix == null) throw new NullPointerException("Null prefix");
        else if ("spreadsheet".equals(prefix)) return "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
        else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
        return XMLConstants.NULL_NS_URI;
    }

    // This method isn't necessary for XPath processing.
    public String getPrefix(String uri) {
        throw new UnsupportedOperationException();
    }

    // This method isn't necessary for XPath processing either.
    public Iterator getPrefixes(String uri) {
        throw new UnsupportedOperationException();
    }
});

// note that all the elements in the expression are prefixed with our namespace mapping!
XPathExpression expr = xpath.compile("/spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1]");

// assuming you've got your XML document in a variable named doc...
Node result = (Node) expr.evaluate(doc, XPathConstants.NODE);

And voila...Now you've got your element saved in the result variable.

Caveat: if you're parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true) on your DocumentBuilderFactory. Otherwise, this code won't work!

Solution 3

All namespaces that you intend to select from in the source XML must be associated with a prefix in the host language. In Java/JAXP this is done by specifying the URI for each namespace prefix using an instance of javax.xml.namespace.NamespaceContext. Unfortunately, there is no implementation of NamespaceContext provided in the SDK.

Fortunately, it's very easy to write your own:

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.namespace.NamespaceContext;

public class SimpleNamespaceContext implements NamespaceContext {

    private final Map<String, String> PREF_MAP = new HashMap<String, String>();

    public SimpleNamespaceContext(final Map<String, String> prefMap) {
        PREF_MAP.putAll(prefMap);       
    }

    public String getNamespaceURI(String prefix) {
        return PREF_MAP.get(prefix);
    }

    public String getPrefix(String uri) {
        throw new UnsupportedOperationException();
    }

    public Iterator getPrefixes(String uri) {
        throw new UnsupportedOperationException();
    }

}

Use it like this:

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
HashMap<String, String> prefMap = new HashMap<String, String>() {{
    put("main", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
    put("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
}};
SimpleNamespaceContext namespaces = new SimpleNamespaceContext(prefMap);
xpath.setNamespaceContext(namespaces);
XPathExpression expr = xpath
        .compile("/main:workbook/main:sheets/main:sheet[1]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);

Note that even though the first namespace does not specify a prefix in the source document (i.e. it is the default namespace) you must associate it with a prefix anyway. Your expression should then reference nodes in that namespace using the prefix you've chosen, like this:

/main:workbook/main:sheets/main:sheet[1]

The prefix names you choose to associate with each namespace are arbitrary; they do not need to match what appears in the source XML. This mapping is just a way to tell the XPath engine that a given prefix name in an expression correlates with a specific namespace in the source document.

Solution 4

If you are using Spring, it already contains org.springframework.util.xml.SimpleNamespaceContext.

        import org.springframework.util.xml.SimpleNamespaceContext;
        ...

        XPathFactory xPathfactory = XPathFactory.newInstance();
        XPath xpath = xPathfactory.newXPath();
        SimpleNamespaceContext nsc = new SimpleNamespaceContext();

        nsc.bindNamespaceUri("a", "http://some.namespace.com/nsContext");
        xpath.setNamespaceContext(nsc);

        XPathExpression xpathExpr = xpath.compile("//a:first/a:second");

        String result = (String) xpathExpr.evaluate(object, XPathConstants.STRING);

Solution 5

I've written a simple NamespaceContext implementation (here), that takes a Map<String, String> as input, where the key is a prefix, and the value is a namespace.

It follows the NamespaceContext spesification, and you can see how it works in the unit tests.

Map<String, String> mappings = new HashMap<>();
mappings.put("foo", "http://foo");
mappings.put("foo2", "http://foo");
mappings.put("bar", "http://bar");

context = new SimpleNamespaceContext(mappings);

context.getNamespaceURI("foo");    // "http://foo"
context.getPrefix("http://foo");   // "foo" or "foo2"
context.getPrefixes("http://foo"); // ["foo", "foo2"]

Note that it has a dependency on Google Guava

Share:
96,486
Inez
Author by

Inez

Updated on July 05, 2022

Comments

  • Inez
    Inez almost 2 years

    When my XML looks like this (no xmlns) then I can easly query it with XPath like /workbook/sheets/sheet[1]

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <workbook>
      <sheets>
        <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
      </sheets>
    </workbook>
    

    But when it looks like this then I can't

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
      <sheets>
        <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
      </sheets>
    </workbook>
    

    Any ideas?

  • Inez
    Inez about 13 years
    How to do it with just Java SDK? I don't have SimpleNamespaceContext and don't want to use external libs.
  • stevevls
    stevevls about 13 years
    @lnez check it out...i updated my answer to show how you can do it with standard jdk classes.
  • vikingsteve
    vikingsteve over 10 years
    I found another way to use the namespaces, but you gave me the hint - so thank you.
  • nokul
    nokul about 10 years
    I don't get why I need to associate the namespace URI and the namespace prefix in my XPath, anyway? In the XML document, there is already such an association, like xmlns:r="schemas.openxmlformats.org/officeDocument/2006/rela‌​tionships" in the original question. There, the prefix r is bound to the namespace URI. The way I read it, I'd be forced to re-establish this connection in my XPath (or programmatically).
  • Stephan
    Stephan about 9 years
    @vikingsteve Can you post your "another way"?
  • vikingsteve
    vikingsteve about 9 years
    Apologies @Stephan, I can't remember exactly what I did there, but this put me on the right track.
  • Aisah Hamzah
    Aisah Hamzah almost 9 years
    I would advice against this practice. If at all possible, do not match by local name and namespace, it will clutter your code and the fast hash-speed lookup will not work. @nokul: that's because an XPath can operate on any document and the namespace prefix can be different, but the namespace not. If you bind xmlns:xx to namespace aaa, and the document has <yy:foo> in the same namespace, the xpath expression xx:foo will select that node.
  • Espinosa
    Espinosa over 8 years
    +1 for setNamespaceAware(true) ..xpath was driving me crazy before I found that issue is not in registering NS or xpath statement itself but rather much earlier on!
  • Espinosa
    Espinosa over 8 years
    +1 for neat NamespaceContext implementation. You should stress that setNamespaceAware(true) is set on DocumentBuilderFactory as @stevevls did. Otherwise, this code won't work! It is not that easy to figure out. Basically if one have xml with namespaces and don't make DBF NS aware then xpath is silently turned useless and only searching using local-name() works.
  • Cheeso
    Cheeso over 8 years
    re: "if you're parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true) on your DocumentBuilderFactory." OMG Java is sooo dumb. 2 hours on this.
  • markdsievers
    markdsievers over 7 years
    If you have a default namespace (xmlns="http://www.default.com/..." as well as prefixed ones xmlns:foo="http://www.foo.com/...") then you also need to provide a mapping for default in order for your XPath expressions to be able to target the elements using the default namespace (eg they don't have a prefix). For the example above simply add another condition to getNamespaceURI eg else if ("default".equals(prefix)) return "http://www.default.com/...";. Took me a bit to figure this out, hopefully can save someone else some engineering hours.
  • markdsievers
    markdsievers over 7 years
    If you have a default namespace (xmlns="http://www.default.com/..." as well as prefixed ones xmlns:foo="http://www.foo.com/...") then you also need to provide a mapping for default in order for your XPath expressions to be able to target the elements using the default namespace (eg they don't have a prefix). For the example above simply add another condition to getNamespaceURI eg else if ("default".equals(prefix)) return "http://www.default.com/...";. Took me a bit to figure this out, hopefully can save someone else some engineering hours.
  • DAB
    DAB over 5 years
    Excellent answer. I would like to add that you should make sure your XML document has first been opened using setNamespaceAware(true);
  • joriki
    joriki almost 5 years
    @markdsievers: But the answer does exactly that (using "spreadsheet" as the prefix for the default namespace).
  • joriki
    joriki almost 5 years
    @markdsievers: But the answer does exactly that (using "spreadsheet" as the prefix for the default namespace).
  • Wayne
    Wayne almost 5 years
    Yeah, I called that out explicitly :)
  • Steve Harrison
    Steve Harrison over 4 years
    The following xpath did not work in our case: /NotifyShipment/DataArea/Shipment/ShipmentHeader/Status/Code‌​/text() and this xpath appears to be helping based on above answer: (/*[local-name()='NotifyShipment']/*[local-name()='DataArea'‌​]/*[local-name()='Sh‌​ipment']/*[local-nam‌​e()='ShipmentHeader'‌​]/*[local-name()='St‌​atus']/*[local-name(‌​)='Code']/text()). we might come out another approach, but thank you for a very good note!