Using Xpath how to extract data from table cells that contain links sometimes?

25,785

Just add a second forward slash before text():

//table[@class="info"]//td[2]//text()

this will fetch text nodes from all children of your selected td

Share:
25,785
Erba Aitbayev
Author by

Erba Aitbayev

IT specialist.

Updated on December 01, 2020

Comments

  • Erba Aitbayev
    Erba Aitbayev over 3 years

    I have this html table:

    <table class="info">
    <tbody>
        <tr><td class="name">Year</td><td>2011</td></tr>
        <tr><td class="name">Storey</td><td>3</td></tr>
        <tr><td class="name">Title</td><td><a href="http://gov.kz/premera/">Premier</a></td></tr>
        <tr><td class="name">Condition</td><td>Renovated</td></tr>
    </tbody>
    </table>
    

    In this table data is organized in such way that each row contains 2 cells enclosed in <td> tags. First cell contains information about data type. For example year of building of house. Second cell contains year information itself which is 2011.

    I am trying to extract information from 2-nd cell (it is: 2011, 3, Premier, Renovated)

    I use this Xpath expression:

    //table[@class="info"]//td[2]/text()
    

    Received output (wrong):

    2011
    3
    Renovated
    

    Desired output:

    2011
    3
    Premier
    Renovated
    

    As you can see 2-nd <td> in 3-rd row instead of just text contains link and therefore information from this row is missed. So, desired string "Premier" is not received. Sometimes cells in rows include links, sometimes it is just plain text. Is there any way I can extract data from 2-nd cell in both cases (link or just text given)?