xpath expression for regex-like matching?
Solution 1
How about this (updated):
XPath 1.0:
"//div[substring-before(@id, '_') = 'foo'
and substring-after(@id, '_') >= 0
and substring-after(@id, '_') <= 99999999]"
Edit #2: The OP made a change to the question. The following, even more reduced XPath 1.0 expression works for me:
"//div[substring(@id, 1, 13) = 'post_message_'
and substring(@id, 14) >= 0
and substring(@id, 14) <= 99999999]"
XPath 2.0 has a convenient matches()
function:
"//div[matches(@id, '^foo_\d{1,8}$')]"
Apart from the better portability, I would expect the numerical expression (XPath 1.0 style) to perform better than the regex test, though this would only become noticeable when processing large data sets.
Original version of the answer:
"//div[substring-before(@id, '_') = 'foo'
and number(substring-after(@id, '_')) = substring-after(@id, '_')
and number(substring-after(@id, '_')) >= 0
and number(substring-after(@id, '_')) <= 99999999]"
The use of the number()
function is unnecessary, because the mathematical comparison operators coerce their arguments to numbers implicitly, any non-numbers will become NaN
and the greater than/less than tests will fail.
I also removed the encoding of the angle brackets, since this is an XML requirement, not an XPath requirement.
Solution 2
As already pointed out, in XPath 2.0 it would be good to use its standard regex capabilities with a function like the matches()
function.
One possible XPath 1.0 solution:
//div[starts-with(@id, 'post_message_')
and
string-length(@id) = 21
and
translate(substring-after(@id, 'post_message_'),
'0123456789',
''
)
=
''
]
Do note the following:
The use of the standard XPath function
starts-with()
.The use of the standard XPath function
string-length()
.The use of the standard XPath function
substring-after()
.The use of the standard XPath function
translate()
.
Solution 3
Or use xpath function matches(string,pattern).
<xsl:if test="matches(name(.),'foo_')">
Unfortunately it's not regex, but it might be enough unless you have other foo_ tags you don't need, then I Guess you can add a few more "if" checks to cull them out.
mhd
Updated on June 05, 2022Comments
-
mhd about 2 years
I want to search div id in an html doc with certain pattern. I want to match this pattern in regex:
foo_([[:digit:]]{1.8})
using xpath. What is the xpath equivalent for the above pattern?
I'm stuck with
//div[@id="foo_
and then what? If someone could continue a legal expression for it.EDIT
Sorry, I think I have to elaborate more. Actually it's not
foo_
, it'spost_message_
Btw, I use mechanize/nokogiri ( ruby )
Here's the snippet :
html_doc = Nokogiri::HTML(open(myfile)) message_div = html_doc.xpath('//div[substring(@id,13) = "post_message_" and substring-after(@id, "post_message_") => 0 and substring-after(@id, "post_message_") <= 99999999]')
Still failed. Error message:
Couldn't evaluate expression '
//div[substring(@id,13) = "post_message_" and substring-after(@id, "post_message_") => 0 and substring-after(@id, "post_message_") <= 99999999]
' (Nokogiri::XML::XPath::SyntaxError) -
phihag over 15 yearsmatches is a regexp function, see w3.org/TR/xpath-functions/#func-matches: "The function returns true if $input matches the regular expression ..."
-
Tomalak over 15 yearsThanks for the up-vote. ;-) Nice alternative solution, though it does not fulfill the {1,8} conditon the OP asked for. But that is easily fixed with a string-length() test.
-
Dimitre Novatchev over 15 years@Tomalak Sorry, I am on a 2-day trip and only have minutes free time to look at SO problems. I'll edit this now.
-
Dimitre Novatchev over 15 years@Tomalak Thanks for your suggestions.
-
Robert Gould over 15 yearsAwesome! I had under estimated matches. But now I need to go back and refactor, my archaic searches !
-
mhd over 15 yearsThanks a lot! it works now,I use your Xpath 1.0 solution due to better support of Xpath 1.0 in lib-xml. Fyi,Nokogiri has different quotes syntax (see my code above). All in all, Accepted and voted-up :)