How to get H1,H2,H3,... using a single xpath expression

27,984

Use:

/html/body/*[self::h1 or self::h2 or self::h3]/text()

The following expression is incorrect:

//html/body/*[local-name() = "h1"  
           or local-name() = "h2"  
           or local-name() = "h3"]/text()  

because it may select text nodes that are children of unwanted:h1, different:h2, someWeirdNamespace:h3.

Another recommendation: Always avoid using // when the structure of the XML document is statically known. Using // most often results in significant inefficiencies because it causes the complete document (sub)tree roted in the context node to be traversed.

Share:
27,984

Related videos on Youtube

Aivan Monceller
Author by

Aivan Monceller

Computer science is no more about computers than astronomy is about telescopes. --Edsger Dijkstra

Updated on November 04, 2020

Comments

  • Aivan Monceller
    Aivan Monceller over 3 years

    How can I get H1,H2,H3 contents in one single xpath expression?

    I know I could do this.

    //html/body/h1/text()
    //html/body/h2/text()
    //html/body/h3/text() 
    

    and so on.

  • Michael Kay
    Michael Kay over 12 years
    On the performance question, your mileage may vary. Some products go to great lengths to optimise queries using //x.
  • Aswin Sathyan
    Aswin Sathyan about 8 years
    I want to get the text inside the p tag. The h tag can be h3 or h4 or h5 <div class="content"><h3>Ingredients:</h3><p >Tomato Purée, Acidity Regulator (Citric Acid)</p> How to get it using a single xpath.?? thanks in advance
  • Dimitre Novatchev
    Dimitre Novatchev about 8 years
    @AswinSathyan, Just ask a separate question at SO.
  • ptim
    ptim over 6 years
    great! I needed a descendant selector under body to get all headings: /html/body//*[self::h1 or self::h2 or self::h3]/text()