Generate/get xpath from XML node java

62,836

Solution 1

Update:

@c0mrade has updated his question. Here is a solution to it:

This XSLT transformation:

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="vApos">'</xsl:variable>

    <xsl:template match="*[@* or not(*)] ">
      <xsl:if test="not(*)">
         <xsl:apply-templates select="ancestor-or-self::*" mode="path"/>
         <xsl:value-of select="concat('=',$vApos,.,$vApos)"/>
         <xsl:text>&#xA;</xsl:text>
        </xsl:if>
        <xsl:apply-templates select="@*|*"/>
    </xsl:template>

    <xsl:template match="*" mode="path">
        <xsl:value-of select="concat('/',name())"/>
        <xsl:variable name="vnumPrecSiblings" select=
         "count(preceding-sibling::*[name()=name(current())])"/>
        <xsl:if test="$vnumPrecSiblings">
            <xsl:value-of select="concat('[', $vnumPrecSiblings +1, ']')"/>
        </xsl:if>
    </xsl:template>

    <xsl:template match="@*">
        <xsl:apply-templates select="../ancestor-or-self::*" mode="path"/>
        <xsl:value-of select="concat('[@',name(), '=',$vApos,.,$vApos,']')"/>
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root>
    <elemA>one</elemA>
    <elemA attribute1='first' attribute2='second'>two</elemA>
    <elemB>three</elemB>
    <elemA>four</elemA>
    <elemC>
        <elemB>five</elemB>
    </elemC>
</root>

produces exactly the wanted, correct result:

/root/elemA='one'
/root/elemA[2]='two'
/root/elemA[2][@attribute1='first']
/root/elemA[2][@attribute2='second']
/root/elemB='three'
/root/elemA[3]='four'
/root/elemC/elemB='five'

When applied to the newly-provided document by @c0mrade:

<root>
    <elemX serial="kefw90234kf2esda9231">
        <id>89734</id>
    </elemX>
</root>

again the correct result is produced:

/root/elemX='89734'
/root/elemX[@serial='kefw90234kf2esda9231']

Explanation:

  • Only elements that have no children elements, or have attributes are matched and processed.

  • For any such element, if it doesn't have children-elements all of its ancestor-or self elements are processed in a specific mode, named 'path'. Then the "='theValue'" part is output and then a NL character.

  • All attributes of the matched element are then processed.

  • Then finally, templates are applied to all children-elements.

  • Processing an element in the 'path' mode is simple: A / character and the name of the element are output. Then, if there are preceding siblings with the same name, a "[numPrecSiblings+1]` part is output.

  • Processing of attributes is simple: First all ancestor-or-self:: elements of its parent are processed in 'path' mode, then the [attrName=attrValue] part is output, followed by a NL character.

Do note:

  • Names that are in a namespace are displayed without any problem and in their initial readable form.

  • To aid readability, an index of [1] is never displayed.


Below is my initial answer (may be ignored)

Here is a pure XSLT 1.0 solution:

Below is a sample xml document and a stylesheet that takes a node-set parameter and produces one valid XPath expression for every member-node.

stylesheet (buildPath.xsl):


<xsl:stylesheet version='1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
xmlns:msxsl="urn:schemas-microsoft-com:xslt" 
>

<xsl:output method="text"/>
<xsl:variable name="theParmNodes" select="//namespace::*[local-name() =
'myNamespace']"/>
<xsl:template match="/">
  <xsl:variable name="theResult">
    <xsl:for-each select="$theParmNodes">
    <xsl:variable name="theNode" select="."/>
    <xsl:for-each select="$theNode |
$theNode/ancestor-or-self::node()[..]">
      <xsl:element name="slash">/</xsl:element>
      <xsl:choose>
        <xsl:when test="self::*">           
          <xsl:element name="nodeName">
            <xsl:value-of select="name()"/>
            <xsl:variable name="thisPosition" 
                select="count(preceding-sibling::*[name(current()) = 
                        name()])"/>
            <xsl:variable name="numFollowing" 
                select="count(following-sibling::*[name(current()) = 
                        name()])"/>
            <xsl:if test="$thisPosition + $numFollowing > 0">
              <xsl:value-of select="concat('[', $thisPosition +
                                                           1, ']')"/>
            </xsl:if>
          </xsl:element>
        </xsl:when>
        <xsl:otherwise> <!-- This node is not an element -->
          <xsl:choose>
            <xsl:when test="count(. | ../@*) = count(../@*)">   
            <!-- Attribute -->
              <xsl:element name="nodeName">
                <xsl:value-of select="concat('@',name())"/>
              </xsl:element>
            </xsl:when>     
            <xsl:when test="self::text()">  <!-- Text -->
              <xsl:element name="nodeName">
                <xsl:value-of select="'text()'"/>
                <xsl:variable name="thisPosition" 
                          select="count(preceding-sibling::text())"/>
                <xsl:variable name="numFollowing" 
                          select="count(following-sibling::text())"/>
                <xsl:if test="$thisPosition + $numFollowing > 0">
                  <xsl:value-of select="concat('[', $thisPosition + 
                                                           1, ']')"/>
                </xsl:if>
              </xsl:element>
            </xsl:when>     
            <xsl:when test="self::processing-instruction()">
            <!-- Processing Instruction -->
              <xsl:element name="nodeName">
                <xsl:value-of select="'processing-instruction()'"/>
                <xsl:variable name="thisPosition" 
                   select="count(preceding-sibling::processing-instruction())"/>
                <xsl:variable name="numFollowing" 
                    select="count(following-sibling::processing-instruction())"/>
                <xsl:if test="$thisPosition + $numFollowing > 0">
                  <xsl:value-of select="concat('[', $thisPosition + 
                                                            1, ']')"/>
                </xsl:if>
              </xsl:element>
            </xsl:when>     
            <xsl:when test="self::comment()">   <!-- Comment -->
              <xsl:element name="nodeName">
                <xsl:value-of select="'comment()'"/>
                <xsl:variable name="thisPosition" 
                         select="count(preceding-sibling::comment())"/>
                <xsl:variable name="numFollowing" 
                         select="count(following-sibling::comment())"/>
                <xsl:if test="$thisPosition + $numFollowing > 0">
                  <xsl:value-of select="concat('[', $thisPosition + 
                                                            1, ']')"/>
                </xsl:if>
              </xsl:element>
            </xsl:when>     
            <!-- Namespace: -->
            <xsl:when test="count(. | ../namespace::*) = 
                                               count(../namespace::*)">

              <xsl:variable name="apos">'</xsl:variable>
              <xsl:element name="nodeName">
                <xsl:value-of select="concat('namespace::*', 
                '[local-name() = ', $apos, local-name(), $apos, ']')"/>

              </xsl:element>
            </xsl:when>     
          </xsl:choose>
        </xsl:otherwise>            
      </xsl:choose>
    </xsl:for-each>
    <xsl:text>&#xA;</xsl:text>
  </xsl:for-each>
 </xsl:variable>
 <xsl:value-of select="msxsl:node-set($theResult)"/>
</xsl:template>
</xsl:stylesheet>

xml source (buildPath.xml):


<!-- top level Comment -->
<root>
    <nodeA>textA</nodeA>
 <nodeA id="nodeA-2">
  <?myProc ?>
        xxxxxxxx
  <nodeB/>
        <nodeB xmlns:myNamespace="myTestNamespace">
  <!-- Comment within /root/nodeA[2]/nodeB[2] -->
   <nodeC/>
  <!-- 2nd Comment within /root/nodeA[2]/nodeB[2] -->
        </nodeB>
        yyyyyyy
  <nodeB/>
  <?myProc2 ?>
    </nodeA>
</root>
<!-- top level Comment -->

Result:

/root/nodeA[2]/nodeB[2]/namespace::*[local-name() = 'myNamespace']
/root/nodeA[2]/nodeB[2]/nodeC/namespace::*[local-name() =
'myNamespace']

Solution 2

Here is how this can be done with SAX:

import java.util.HashMap;
import java.util.Map;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;

public class FragmentContentHandler extends DefaultHandler {

    private String xPath = "/";
    private XMLReader xmlReader;
    private FragmentContentHandler parent;
    private StringBuilder characters = new StringBuilder();
    private Map<String, Integer> elementNameCount = new HashMap<String, Integer>();

    public FragmentContentHandler(XMLReader xmlReader) {
        this.xmlReader = xmlReader;
    }

    private FragmentContentHandler(String xPath, XMLReader xmlReader, FragmentContentHandler parent) {
        this(xmlReader);
        this.xPath = xPath;
        this.parent = parent;
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException {
        Integer count = elementNameCount.get(qName);
        if(null == count) {
            count = 1;
        } else {
            count++;
        }
        elementNameCount.put(qName, count);
        String childXPath = xPath + "/" + qName + "[" + count + "]";

        int attsLength = atts.getLength();
        for(int x=0; x<attsLength; x++) {
            System.out.println(childXPath + "[@" + atts.getQName(x) + "='" + atts.getValue(x) + ']');
        }

        FragmentContentHandler child = new FragmentContentHandler(childXPath, xmlReader, this);
        xmlReader.setContentHandler(child);
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        String value = characters.toString().trim();
        if(value.length() > 0) {
            System.out.println(xPath + "='" + characters.toString() + "'");
        }
        xmlReader.setContentHandler(parent);
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        characters.append(ch, start, length);
    }

}

It can be tested with:

import java.io.FileInputStream;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;

public class Demo {

    public static void main(String[] args) throws Exception {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        XMLReader xr = sp.getXMLReader();

        xr.setContentHandler(new FragmentContentHandler(xr));
        xr.parse(new InputSource(new FileInputStream("input.xml")));
    }
}

This will produce the desired output:

//root[1]/elemA[1]='one'
//root[1]/elemA[2][@attribute1='first]
//root[1]/elemA[2][@attribute2='second]
//root[1]/elemA[2]='two'
//root[1]/elemB[1]='three'
//root[1]/elemA[3]='four'
//root[1]/elemC[1]/elemB[1]='five'

Solution 3

With jOOX (a jquery API port to Java, disclaimer - I work for the company behind the library), you can almost achieve what you want in a single statement:

// I'm assuming this:
import static org.joox.JOOX.$;

// And then...
List<String> coolList = $(document).xpath("//*[not(*)]").map(
    context -> $(context).xpath() + "='" + $(context).text() + "'"
);

If document is your sample document:

<root>
    <elemA>one</elemA>
    <elemA attribute1='first' attribute2='second'>two</elemA>
    <elemB>three</elemB>
    <elemA>four</elemA>
    <elemC>
        <elemB>five</elemB>
    </elemC>
</root>

This will produce

/root[1]/elemA[1]='one'
/root[1]/elemA[2]='two'
/root[1]/elemB[1]='three'
/root[1]/elemA[3]='four'
/root[1]/elemC[1]/elemB[1]='five'

By "almost", I mean that jOOX does not (yet) support matching/mapping attributes. Hence, your attributes will not produce any output. This will be implemented in the near future, though.

Solution 4

private static void buildEntryList( List<String> entries, String parentXPath, Element parent ) {
    NamedNodeMap attrs = parent.getAttributes();
    for( int i = 0; i < attrs.getLength(); i++ ) {
        Attr attr = (Attr)attrs.item( i );
        //TODO: escape attr value
        entries.add( parentXPath+"[@"+attr.getName()+"='"+attr.getValue()+"']"); 
    }
    HashMap<String, Integer> nameMap = new HashMap<String, Integer>();
    NodeList children = parent.getChildNodes();
    for( int i = 0; i < children.getLength(); i++ ) {
        Node child = children.item( i );
        if( child instanceof Text ) {
            //TODO: escape child value
            entries.add( parentXPath+"='"+((Text)child).getData()+"'" );
        } else if( child instanceof Element ) {
            String childName = child.getNodeName();
            Integer nameCount = nameMap.get( childName );
            nameCount = nameCount == null ? 1 : nameCount + 1;
            nameMap.put( child.getNodeName(), nameCount );
            buildEntryList( entries, parentXPath+"/"+childName+"["+nameCount+"]", (Element)child);
        }
    }
}

public static List<String> getEntryList( Document doc ) {
    ArrayList<String> entries = new ArrayList<String>();
    Element root = doc.getDocumentElement();
    buildEntryList(entries, "/"+root.getNodeName()+"[1]", root );
    return entries;
}

This code works with two assumptions: you aren't using namespaces and there are no mixed content elements. The namespace limitation isn't a serious one, but it'd make your XPath expression much harder to read, as every element would be something like *:<name>[namespace-uri()='<nsuri>'][<index>], but otherwise it's easy to implement. Mixed content on the other hand would make the use of xpath very tedious, as you'd have to be able to individually address the second, third and so on text node within an element.

Solution 5

  1. use w3c.dom
  2. go recursively down
  3. for each node there is easy way to get it's xpath: either by storing it as array/list while #2, or via function which goes recursively up until parent is null, then reverses array/list of encountered nodes.

something like that.

UPD: and concatenate final list in order to get final xpath. don't think attributes will be a problem.

Share:
62,836
ant
Author by

ant

www.linkedin.com/in/ibrahimbegovic

Updated on February 27, 2020

Comments

  • ant
    ant about 4 years

    I'm interested in advice/pseudocode code/explanation rather than actual implementation.

    • I'd like to go trough xml document, all of its nodes
    • Check the node for attribute existence

    Case if node doesn't have attribute, get/generate String with value of its xpath
    Case if node does have attributes, iterate trough attribute list and create xpath for each attribute including the node as well.

    Word of advice? Hopefully you will provide some useful intel

    EDIT:

    Reason for doing this is .. I'm writing automated tests in jmeter, so for every request I need to verify that request actually did its job so I'm asserting results by getting nodes values with xpath.(extra info - irrelevant)

    When the request is small its not problem to create asserts by hand, but for larger ones its a really pain in the .. (extra info - irrelevant)

    BOUNTY :

    I'm looking for java approach

    Goal

    My goal is to achieve following from this ex xml file :

    <root>
        <elemA>one</elemA>
        <elemA attribute1='first' attribute2='second'>two</elemA>
        <elemB>three</elemB>
        <elemA>four</elemA>
        <elemC>
            <elemB>five</elemB>
        </elemC>
    </root>
    

    to produce the following :

    //root[1]/elemA[1]='one'
    //root[1]/elemA[2]='two'
    //root[1]/elemA[2][@attribute1='first']
    //root[1]/elemA[2][@attribute2='second']
    //root[1]/elemB[1]='three'
    //root[1]/elemA[3]='four'
    //root[1]/elemC[1]/elemB[1]='five'
    

    Explained :

    • If node value/text is not null/zero, get xpath , add = 'nodevalue' for assertion purpose
    • If node has attributes create assert for them too

    BOUNTY UPDATE :

    I found this example, it doesn't produce the correct results , but I'm looking something like this:

    http://www.coderanch.com/how-to/java/SAXCreateXPath

  • ant
    ant over 13 years
    @Dimitre Novatchev thank you for your answer but I'm looking for java approach, +1 for your effort
  • BalusC
    BalusC over 13 years
    Just let Java run the XSLT and collect its results?
  • ant
    ant over 13 years
    @BalusC I could do that but this is not exactly what I've asked, and since I don't know this code I'm more comfortable with code I can update/edit, I updated my question. tnx
  • ant
    ant over 13 years
    thank you for your answer, does this library have some docs/examples?
  • ant
    ant over 13 years
    @Dimitre Novatchev Great it works exactly as I want. I'm really impressed by the small size of code and what it does. Looks like you know you way arround xsl/xml I'll have to explore xsl definitely. Can you recommend some useful web/book resources for me to go trough? I've already bookmarked your blog, seen tons of code there which I don't really get I need to start with basics work my way to the top. Great tnx once again, I can accept bounty in 21h, I will when that time expires. Thanks for the help
  • Dimitre Novatchev
    Dimitre Novatchev over 13 years
    @c0mrade: You are welcome. Yes, XSLT is a very powerful language. For more resources, please, have a look at my answer to another SO question: stackoverflow.com/questions/339930/…
  • ant
    ant over 13 years
    @Dimitre Novatchev, please see BOUNTY UPDATE II, I updated my question. After analyzing bigger xml file I noticed this one, again I think I didn't give the correct example in my question. Is this a big change in your code? Can you please change it to work with latest update? I will accept the bounty either way in 5 hours when I'm able to.
  • biziclop
    biziclop over 13 years
    Nice one :) All we need now is a StAX implementation and we'll have the full set.
  • ant
    ant over 13 years
    @Dimitre Novatchev absolutely amazing, thanks a million. It works exactly as I planned. I will definitely have to go trough links you suggested. thanks
  • ant
    ant over 13 years
    +1 for your effort, I second biziclop s comment, someone could find it to be useful in the future
  • LarsH
    LarsH over 12 years
    Wait a minute... elementNameCount counts occurrences of a particular element type (name) globally across the document, regardless of whether they are siblings, cousins (same level but different parent), or on different levels. But you output XPath "[" + count + "]" as if we're counting position among siblings. This will clearly fail for nontrivial documents. Right? E.g. <a><a>foo</a></a> would output //a[1]/a[2]='foo', and the [2] is incorrect.
  • Ashwin
    Ashwin almost 12 years
    @BlaiseDoughan Can you please hava look at this question - stackoverflow.com/questions/10698287/… . I am using xml signatures in java and for that I have to extract the part to be signed by using xpath. But it just doesn't work.
  • Ashwin
    Ashwin almost 12 years
    Can you please hava look at this question - stackoverflow.com/questions/10698287/… . I am using xml signatures in java and for that I have to extract the part to be signed by using xpath. But it just doesn't work
  • Lukas Eder
    Lukas Eder almost 12 years
    @Ashwin: I'm sorry, I don't have any experience with "XPath transformation". I don't recognise that library you're using there
  • Jason S
    Jason S over 8 years
    what's with the dollar sign $? That's legal Java?!
  • Lukas Eder
    Lukas Eder over 8 years
    @JasonS It's a legal identifier, yes. It's static-imported from JOOX.$. I'll update the answer
  • NagyI
    NagyI over 7 years
    @LarsH no it's not, because there's a new FragmentContentHandler created at each startElement transition with it's own elementNameCount registry. This should work correctly, but have to try it myself.
  • LarsH
    LarsH over 7 years
    @Nagyl: You may be right. I haven't looked at this in 3.5 years. :-) Let us know if you test it.
  • Brian T Hannan
    Brian T Hannan over 7 years
    This works great but not on large XML files. Any recommendations?
  • Lukas Eder
    Lukas Eder over 7 years
    @BrianTHannan: You could implement a SAX handler