beautifulsoup: find_all on bs4.element.ResultSet object or list?

58,982

ResultSet class is a subclass of a list and not a Tag class which has the find* methods defined. Looping through the results of find_all() is the most common approach:

th_all = soup.find_all('th')
result = []
for th in th_all:
    result.extend(th.find_all(text='A'))

Usually, CSS selectors may help you solve it in one go except that not everything you can do with find_all() is possible with the select() method. For instance, there is no "text" search available in bs4 CSS selectors. But, if, for example, you had to find all, say, b elements inside th elements, you could do:

soup.select("th td")
Share:
58,982
YJZ
Author by

YJZ

Updated on July 31, 2022

Comments

  • YJZ
    YJZ almost 2 years

    Hi so I apply find_all on a beautifulsoup object, and find something, which is an bs4.element.ResultSet object or a list.

    I want to further do find_all in there, but it's not allowed on a bs4.element.ResultSet object. I can loop through each element of the bs4.element.ResultSet object to do find_all. But can I avoid looping and just convert it back to a beautifulsoup object?

    See code for details please. Thanks

    html_1 = """
    <table>
        <thead>
            <tr class="myClass">
                <th>A</th>
                <th>B</th>
                <th>C</th>
                <th>D</th>
            </tr>
        </thead>
    </table>
    """
    soup = BeautifulSoup(html_1, 'html.parser')
    
    type(soup) #bs4.BeautifulSoup
    
    # do find_all on beautifulsoup object
    th_all = soup.find_all('th')
    
    # the result is of type bs4.element.ResultSet or similarly list
    type(th_all) #bs4.element.ResultSet
    type(th_all[0:1]) #list
    
    # now I want to further do find_all
    th_all.find_all(text='A') #not work
    
    # can I avoid this need of loop?
    for th in th_all:
        th.find_all(text='A') #works