beautifulsoup: find_all on bs4.element.ResultSet object or list?
ResultSet
class is a subclass of a list and not a Tag
class which has the find*
methods defined. Looping through the results of find_all()
is the most common approach:
th_all = soup.find_all('th')
result = []
for th in th_all:
result.extend(th.find_all(text='A'))
Usually, CSS selectors may help you solve it in one go except that not everything you can do with find_all()
is possible with the select()
method. For instance, there is no "text" search available in bs4
CSS selectors. But, if, for example, you had to find all, say, b
elements inside th
elements, you could do:
soup.select("th td")
YJZ
Updated on July 31, 2022Comments
-
YJZ almost 2 years
Hi so I apply find_all on a
beautifulsoup object
, and find something, which is anbs4.element.ResultSet object
or alist
.I want to further do find_all in there, but it's not allowed on a
bs4.element.ResultSet object
. I can loop through each element of thebs4.element.ResultSet object
to do find_all. But can I avoid looping and just convert it back to abeautifulsoup object
?See code for details please. Thanks
html_1 = """ <table> <thead> <tr class="myClass"> <th>A</th> <th>B</th> <th>C</th> <th>D</th> </tr> </thead> </table> """ soup = BeautifulSoup(html_1, 'html.parser') type(soup) #bs4.BeautifulSoup # do find_all on beautifulsoup object th_all = soup.find_all('th') # the result is of type bs4.element.ResultSet or similarly list type(th_all) #bs4.element.ResultSet type(th_all[0:1]) #list # now I want to further do find_all th_all.find_all(text='A') #not work # can I avoid this need of loop? for th in th_all: th.find_all(text='A') #works