Java: how to locate an element via xpath string on org.w3c.dom.document

55,255

Try this:

//obtain Document somehow, doesn't matter how
DocumentBuilder b = DocumentBuilderFactory.newInstance().newDocumentBuilder();
org.w3c.dom.Document doc = b.parse(new FileInputStream("page.html"));

//Evaluate XPath against Document itself
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList)xPath.evaluate("/html/body/p/div[3]/a",
        doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); ++i) {
    Element e = (Element) nodes.item(i);
}

With the following page.html file:

<html>
  <head>
  </head>
  <body>
  <p>
    <div></div>
    <div></div>
    <div><a>link</a></div>
  </p>
  </body>
</html>
Share:
55,255
KJW
Author by

KJW

It's about how hard you can take a hit and still move forward.

Updated on August 16, 2022

Comments

  • KJW
    KJW almost 2 years

    How do you quickly locate element/elements via xpath string on a given org.w3c.dom.document? there seems to be no FindElementsByXpath() method. For example

    /html/body/p/div[3]/a
    

    I found that recursively iterating through all the child node levels to be quite slow when there are lot of elements of same name. Any suggestions?

    I cannot use any parser or library, must work with w3c dom document only.

  • Tomasz Nurkiewicz
    Tomasz Nurkiewicz about 13 years
    In my code example doc is of org.w3c.dom.Document type. If you already have an instance of Document, just use two last lines of my code and that's it! P.S.: Why the downvote?
  • KJW
    KJW about 13 years
    this returns text. I need domelement or domelements.
  • Tomasz Nurkiewicz
    Tomasz Nurkiewicz about 13 years
    See my edit (introduction of XPathConstants.NODESET parameter) - now it returns NodeList. Also have a look at other constants as well.
  • KJW
    KJW about 13 years
    Thank you this is a great answer.
  • Sudip7
    Sudip7 over 9 years
    @Tomasz Nukiewicz , can you please look into my implementation. I know I am not the the questioner and itz a different question, but I took the reference from your answer, so I hope u can help me,stackoverflow.com/questions/26389376/…
  • burcakulug
    burcakulug about 9 years
    I think you don't need to do doc.getDocumentElement(), you should be able to run the xpath on org.w3c.dom.Document type directly.