Selecting a css class with xpath
Solution 1
I want to write the canonical answer to this question because the answer above has a problem.
Our problem
The CSS selector:
.foo
will select any element that has the class foo.
How do you do this in XPath?
Although XPath is more powerful than CSS, XPath doesn't have a native equivalent of a CSS class selector. However, there is a solution.
The right way to do it
The equivalent selector in XPath is:
//*[contains(concat(" ", normalize-space(@class), " "), " foo ")]
The function normalize-space strips leading and trailing whitespace (and also replaces sequences of whitespace characters by a single space).
(In a more general sense) this is also the equivalent of the CSS selector:
*[class~="foo"]
which will match any element whose class attribute value is a list of whitespace-separated values, one of which is exactly equal to foo.
A couple of obvious, but wrong ways to do it
The XPath selector:
//*[@class="foo"]
doesn't work! because it won't match an element that has more than one class, for example
<div class="foo bar">
It also won't match if there is any extra whitespace around the class name:
<div class=" foo ">
The 'improved' XPath selector
//*[contains(@class, "foo")]
doesn't work either! because it wrongly matches elements with the class foobar, for example
<div class="foobar">
Credit goes to this fella, who was the earliest published solution to this problem that I found on the web: http://dubinko.info/blog/2007/10/01/simple-parsing-of-space-seprated-attributes-in-xpathxslt/
Solution 2
//[@class="date"]
is not a valid xpath.
Try //*[@class="date"]
, or if you know it is an image, //img[@class="date"]
Solution 3
XPath 3.1 introduces a function contains-token and thus finally solves this ‘officially’. It is designed to support classes.
Example:
//*[contains-token(@class, "foo")]
This function makes sure that white space (not only (U+0020)) is handled correctly, works in case of class name repetition, and generally covers the edge cases.
Note: As of today (2016-12-13) XPath 3.1 has status of Candidate Recommendation.
Solution 4
In XPath 2.0 you can:
//*[count(index-of(tokenize(@class, '\s+' ), 'foo')) = 1]
as stated by Christian Weiske in: https://cweiske.de/tagebuch/XPath%3A%20Select%20element%20by%20class.htm
Solution 5
BEWARE OF MINUS SIGNS IN TEMPLATE !!! If you are querying for "my-ownclass" in DOM:
<ul class="my-ownclass"><li>...</li></ul>
<ul class="someother"><li>...</li></ul>
<ul><li>...</li></ul>
$finder = new DomXPath($dom);
$nodes = $finder->query(".//ul[contains(@class, 'my-ownclass')]"); // This will NOT behave as expected! This will strangely match all the <ul> elements in DOM.
$nodes = $finder->query(".//ul[contains(@class, 'ownclass')]"); // This will match the element.
Teddy13
Updated on February 17, 2020Comments
-
Teddy13 over 4 years
I want to select just a class on its own called .date
For some reason, I cannot get this to work. If anyone knows what is wrong with my code, it would be much appreciated.
@$doc = new DOMDocument(); @$doc->loadHTML($html); $xml = simplexml_import_dom($doc); // just to make xpath more simple $images = $xml->xpath('//[@class="date"]'); foreach ($images as $img) { echo $img." "; }
-
Freek about 10 yearsWhat's the need for normalize-space?
-
LarsH almost 9 years"the answer above" probably refers to MrGlass's.
-
Frozen Flame over 8 yearsIs this possible
<div class="foo\tbar">
? I mean, class names separated by a tab. -
Daniele Orlando over 8 yearsI think
*[class~="foo"]
misses the@
. Should be*[@class~="foo"]
. -
Memke about 8 yearsbut <div class="group-conditions"/> and <div class="condition"/> is the same for $x('//div[contains(concat(" ", normalize-space(@class), " "), "condition")]')
-
JonnyRaa over 6 yearsunfortunately this doesn't seem to be implemented by chrome as of 6/12/2017. based on en.wikipedia.org/wiki/… it seems to be lacking pretty much across the board
-
MasterJoe about 6 years@NielsBom - How do we get around the contains limitation mentioned at the end ? Use css selector instead ? The xpath contains-token given in another answer does not work in the latest chrome.
-
MasterJoe about 6 yearsIt does not work in today's latest chrome. Until it works, how do we get around the limitation that //*[contains(@class, "foo")] will also select any class that contains foo, such as foobar, fooz etc.
-
Niels Bom about 6 years@testerjoe2 did you try
//*[contains(concat(" ", normalize-space(@class), " "), " foo ")]
?