Is there a better way of getting parent node of XPath query result?
Solution 1
The nice thing about xpath queries is that you can essentially treat them like a file system path, so simply having
//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/..
^^
will find all your .a1 nodes that are below a .foo node, then move up one level to the a1 nodes' parents.
Solution 2
An expression that is better than using reverse axis:
//div[contains(@class,'foo')]/div[span[contains(@class,'a1')]]
This selects any div
that is a child of a div
whose class
attribute contains the string "foo" and that (the selected div
) has a span
child whose class
attribute contains the string "a1".
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"//div[contains(@class,'foo')]
/div[span[contains(@class,'a1')]]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<div class="foo">
<div><span class="a1"></span><a href="...">...</a></div>
<div><span class="a2"></span><a href="...">...</a></div>
<div><span class="a1"></span>some text</div>
<div><span class="a3"></span>some text</div>
</div>
the XPath expression is evaluated and the selected elements are copied to the output:
<div>
<span class="a1"/>
<a href="...">...</a>
</div>
<div>
<span class="a1"/>some text</div>
II. Remarks on accessing an Html element by one of its classes:
If it is known that the element can have only one class, then it isn't necessary at all to use contains()
Don't use:
//div[contains(@class, 'foo')]
Use:
//div[@class = 'foo']
or, if there could be leading/trailing spaces, use:
//div[normalize-space(@class) = 'foo']
A crucial issue with:
//div[contains(@class, 'foo')]
is that this selects any div
with class such as "myfoo", "foo2" or "myfoo3".
If the element may have more than one class, and to avoid the above issue, the correct XPath expression is:
//div[contains(concat(' ', @class, ' '), ' foo ')]
Related videos on Youtube
Marcin Orlowski
Hey! I am open for remote projects. Get in touch if you are interested! In meantime try my StackUnderflow - Greasemonkey/Tampermonkey user script, adding user blacklisting, favorite users and other goodies to make your StackOverflow experience better! Hands on experience with various platforms, programming languages (from low-level assembly to most frequently used nowadays), system architect with project management skills and in-field experience; 8+ years academical teacher's background; Scrum Master with ages (like 30 years) of continuous experience in the IT field. My pages on other sites: LinkedIn, Github, VagrantCloud, Packagist
Updated on November 13, 2020Comments
-
Marcin Orlowski over 3 years
Having markup like this:
<div class="foo"> <div><span class="a1"></span><a href="...">...</a></div> <div><span class="a2"></span><a href="...">...</a></div> <div><span class="a1"></span>some text</div> <div><span class="a3"></span>some text</div> </div>
I am interested in getting all
<a>
andsome text
ONLY if adjacentspan
is of classa1
. So at the end of the whole code my result should be<a>
from firstdiv
andsome text
from third one. It'd be easy if<a>
andsome text
were insidespan
ordiv
would haveclass
attribute, but no luck.What I am doing now is look for
span
witha1
class://div[contains(@class,'foo')]/div/span[contains(@class,'a1')]
then I get its parent and do another
query()
with that parent as context node. This simply looks far from being efficient so the question clearly is if there is any better way to accomplish my goal?
THE ANSWER ADDENDUM
As per @MarcB accepted answer, the right query to use is:
//div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/..
but for
<a>
it may be better to use://div[contains(@class,'foo')]/div/span[contains(@class,'a1')]/../a
the get the
<a>
instead of its container. -
Dave Lasley over 11 years+1 for the reference to file system paths, that's how I've always thought of it but I've never heard it explained as such
-
Marcin Orlowski over 11 yearsI just checked manual prior asking the question but it seems I managed to miss ".." as it is clearly there. But FS reference made it clear instantly. Thanks.
-
Marc B over 11 yearsYeah. when I first jumped into xpath, I flailed around like this for a while, but making the query<->path association was quite the eureka moment for me.
-
Marc B over 11 yearsdon't forget that html allows multiple classes.
@class='foo'
will skip overclass="foo bar baz"
. as such, @contains is entirely valid, as long (as you point out), you watch for false positives -
Dimitre Novatchev over 11 years@MarcB, It seems that you haven't read or understood this answer -- It treats at length the case where an element has more than one class. Moreover, this answer provides a correct solution to that case -- not like the incorrect and simplistic
contains(@calss, someString)
-
BeniBela over 11 yearswhich is of course why that language is called X_Path_... But thinking of the query as a path is absolutely not helpful when you want to understand the finer details, especially if you want to update to XPath 2.0 someday. Then / is some kind of binary operator evaluating the right side for every node on the left side... (it has cost me weeks or months thinking of / as path separator, when I wrote a XPath parser)