XPath select all text content for a <div> except for a specific tag <h5>

html xpath siblings

17,101

Try the following XPath expression:

//div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]

This will return:

$ xmllint --html --shell so.html
/ > xpath //div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]    
Object is a Node Set :
Set contains 2 nodes:
1  TEXT
    content=      
2  TEXT
    content=     Sala de estar/jantar,2 vagas de gar...

17,101

Author by

bslima

Updated on July 24, 2022

Comments

bslima almost 2 years

I searched and tried several solutions for this problem but none of them worked: I have this HTML

<div class="detalhes_colunadados">
   <div class="detalhescolunadados_blocos">
     <h5>Descrição completa</h5>
    Sala de estar/jantar,2 vagas de garagem cobertas.<br>
    </div>
    <div class="detalhescolunadados_blocos">
      <h5>Valores</h5>
            Venda: R$ 600.000,00<br>
          Condomínio: R$ 660,00<br>
    </div>
</div>

And wanna to extract by XPath only the text content in the first div class="detalhescolunadados_blocos" that are not h5 tags.

I tried: //div[@class='detalhescolunadados_blocos']/[1]/*[not(self::h5)]

Gilles Quenot over 11 years

Why not using xmllint --html --xpath '//foo' file.html ? =)
nwellnhof over 11 years

Thanks for pointing me to the --xpath option. It's actually undocumented.
bslima over 11 years

Thanks a lot, i was forgetting that the text part is child of h5, i inclusive tried //text()[not(self::h5)].