How to get H1,H2,H3,... using a single xpath expression
27,984
Use:
/html/body/*[self::h1 or self::h2 or self::h3]/text()
The following expression is incorrect:
//html/body/*[local-name() = "h1"
or local-name() = "h2"
or local-name() = "h3"]/text()
because it may select text nodes that are children of unwanted:h1
, different:h2
, someWeirdNamespace:h3
.
Another recommendation: Always avoid using //
when the structure of the XML document is statically known. Using //
most often results in significant inefficiencies because it causes the complete document (sub)tree roted in the context node to be traversed.
Related videos on Youtube
![Aivan Monceller](https://i.stack.imgur.com/fSiz6.jpg?s=256&g=1)
Author by
Aivan Monceller
Computer science is no more about computers than astronomy is about telescopes. --Edsger Dijkstra
Updated on November 04, 2020Comments
-
Aivan Monceller over 3 years
How can I get H1,H2,H3 contents in one single xpath expression?
I know I could do this.
//html/body/h1/text() //html/body/h2/text() //html/body/h3/text()
and so on.
-
Michael Kay over 12 yearsOn the performance question, your mileage may vary. Some products go to great lengths to optimise queries using //x.
-
Aswin Sathyan about 8 yearsI want to get the text inside the p tag. The h tag can be h3 or h4 or h5 <div class="content"><h3>Ingredients:</h3><p >Tomato Purée, Acidity Regulator (Citric Acid)</p> How to get it using a single xpath.?? thanks in advance
-
Dimitre Novatchev about 8 years@AswinSathyan, Just ask a separate question at SO.
-
ptim over 6 yearsgreat! I needed a descendant selector under body to get all headings:
/html/body//*[self::h1 or self::h2 or self::h3]/text()