Is there a better way of getting parent node of XPath query result?

38,121

Solution 1

The nice thing about xpath queries is that you can essentially treat them like a file system path, so simply having

//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/..
                                                              ^^

will find all your .a1 nodes that are below a .foo node, then move up one level to the a1 nodes' parents.

Solution 2

An expression that is better than using reverse axis:

//div[contains(@class,'foo')]/div[span[contains(@class,'a1')]]

This selects any div that is a child of a div whose class attribute contains the string "foo" and that (the selected div) has a span child whose class attribute contains the string "a1".

XSLT - based verification:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//div[contains(@class,'foo')]
          /div[span[contains(@class,'a1')]]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<div class="foo">
   <div><span class="a1"></span><a href="...">...</a></div>
   <div><span class="a2"></span><a href="...">...</a></div>
   <div><span class="a1"></span>some text</div>
   <div><span class="a3"></span>some text</div>
</div>

the XPath expression is evaluated and the selected elements are copied to the output:

<div>
   <span class="a1"/>
   <a href="...">...</a>
</div>
<div>
   <span class="a1"/>some text</div>

II. Remarks on accessing an Html element by one of its classes:

If it is known that the element can have only one class, then it isn't necessary at all to use contains()

Don't use:

//div[contains(@class, 'foo')]

Use:

//div[@class = 'foo']

or, if there could be leading/trailing spaces, use:

//div[normalize-space(@class) = 'foo']

A crucial issue with:

//div[contains(@class, 'foo')]

is that this selects any div with class such as "myfoo", "foo2" or "myfoo3".

If the element may have more than one class, and to avoid the above issue, the correct XPath expression is:

//div[contains(concat(' ', @class, ' '), ' foo ')]
Share:
38,121

Related videos on Youtube

Marcin Orlowski
Author by

Marcin Orlowski

Hey! I am open for remote projects. Get in touch if you are interested! In meantime try my StackUnderflow - Greasemonkey/Tampermonkey user script, adding user blacklisting, favorite users and other goodies to make your StackOverflow experience better! Hands on experience with various platforms, programming languages (from low-level assembly to most frequently used nowadays), system architect with project management skills and in-field experience; 8+ years academical teacher's background; Scrum Master with ages (like 30 years) of continuous experience in the IT field. My pages on other sites: LinkedIn, Github, VagrantCloud, Packagist

Updated on November 13, 2020

Comments

  • Marcin Orlowski
    Marcin Orlowski over 3 years

    Having markup like this:

    <div class="foo">
       <div><span class="a1"></span><a href="...">...</a></div>
       <div><span class="a2"></span><a href="...">...</a></div>
       <div><span class="a1"></span>some text</div>
       <div><span class="a3"></span>some text</div>
    </div>
    

    I am interested in getting all <a> and some text ONLY if adjacent span is of class a1. So at the end of the whole code my result should be <a> from first div and some text from third one. It'd be easy if <a> and some text were inside span or div would have class attribute, but no luck.

    What I am doing now is look for span with a1 class:

    //div[contains(@class,'foo')]/div/span[contains(@class,'a1')]
    

    then I get its parent and do another query() with that parent as context node. This simply looks far from being efficient so the question clearly is if there is any better way to accomplish my goal?


    THE ANSWER ADDENDUM

    As per @MarcB accepted answer, the right query to use is:

    //div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/..
    

    but for <a> it may be better to use:

    //div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/../a
    

    the get the <a> instead of its container.

  • Dave Lasley
    Dave Lasley over 11 years
    +1 for the reference to file system paths, that's how I've always thought of it but I've never heard it explained as such
  • Marcin Orlowski
    Marcin Orlowski over 11 years
    I just checked manual prior asking the question but it seems I managed to miss ".." as it is clearly there. But FS reference made it clear instantly. Thanks.
  • Marc B
    Marc B over 11 years
    Yeah. when I first jumped into xpath, I flailed around like this for a while, but making the query<->path association was quite the eureka moment for me.
  • Marc B
    Marc B over 11 years
    don't forget that html allows multiple classes. @class='foo' will skip over class="foo bar baz". as such, @contains is entirely valid, as long (as you point out), you watch for false positives
  • Dimitre Novatchev
    Dimitre Novatchev over 11 years
    @MarcB, It seems that you haven't read or understood this answer -- It treats at length the case where an element has more than one class. Moreover, this answer provides a correct solution to that case -- not like the incorrect and simplistic contains(@calss, someString)
  • BeniBela
    BeniBela over 11 years
    which is of course why that language is called X_Path_... But thinking of the query as a path is absolutely not helpful when you want to understand the finer details, especially if you want to update to XPath 2.0 someday. Then / is some kind of binary operator evaluating the right side for every node on the left side... (it has cost me weeks or months thinking of / as path separator, when I wrote a XPath parser)