XPath select all text content for a <div> except for a specific tag <h5>

17,101

Try the following XPath expression:

//div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]

This will return:

$ xmllint --html --shell so.html
/ > xpath //div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]    
Object is a Node Set :
Set contains 2 nodes:
1  TEXT
    content=      
2  TEXT
    content=     Sala de estar/jantar,2 vagas de gar...
Share:
17,101
bslima
Author by

bslima

Updated on July 24, 2022

Comments

  • bslima
    bslima almost 2 years

    I searched and tried several solutions for this problem but none of them worked: I have this HTML

    <div class="detalhes_colunadados">
       <div class="detalhescolunadados_blocos">
         <h5>Descrição completa</h5>
        Sala de estar/jantar,2 vagas de garagem cobertas.<br>
        </div>
        <div class="detalhescolunadados_blocos">
          <h5>Valores</h5>
                Venda: R$ 600.000,00<br>
              Condomínio: R$ 660,00<br>
        </div>
    </div>
    

    And wanna to extract by XPath only the text content in the first div class="detalhescolunadados_blocos" that are not h5 tags.

    I tried: //div[@class='detalhescolunadados_blocos']/[1]/*[not(self::h5)]

  • Gilles Quenot
    Gilles Quenot over 11 years
    Why not using xmllint --html --xpath '//foo' file.html ? =)
  • nwellnhof
    nwellnhof over 11 years
    Thanks for pointing me to the --xpath option. It's actually undocumented.
  • bslima
    bslima over 11 years
    Thanks a lot, i was forgetting that the text part is child of h5, i inclusive tried //text()[not(self::h5)].