Find index of tag with certain text in beautifulsoup/python
Solution 1
If your table has a static scheme, it is better using row and column indexes. Try this:
rows = soup.find("table").find("tbody").find_all("tr")
print rows[1].find_all("td")[2].get_text()
Alternatively if you just want to find index number of the tag containing "Year Built":
from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
i = 0
for elem in td_list:
if elem.text == 'Year Built':
ind = i
i += 1
print td_list[ind].text
Solution 2
Convert it to dict and get the value:
from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")] for row in BeautifulSoup(myhtml)("tr")]
dict = dict(zip(table_data[0], table_data[1]))
print dict['Year Built']
Admin
Updated on June 04, 2022Comments
-
Admin almost 2 years
I have a simple 4x2 html table that contains information about a property.
I'm trying to extract the value
1972
, which is under the column heading ofYear Built
. If I find all the tagstd
, how do I extract the index of the tag that contains the textYear Built
?Because once I find that index, I can just add
4
to get to the tag that contains the value1972
.Here is the html:
<table> <tbody> <tr> <td>Building</td> <td>Type</td> <td>Year Built</td> <td>Sq. Ft.</td> </tr> <tr> <td>R01</td> <td>DWELL</td> <td>1972</td> <td>1166</td> </tr> </tbody> </table>
For example I know that if my input is index
2
and my output is text of that tagYear Built
, I can just do this:from bs4 import BeautifulSoup soup = BeautifulSoup(myhtml) td_list = soup.find_all('td') print td_list[2].text
But how do I use input of text
Year Built
to get output of index2
?