How to define findAll for html nested tags using beautifulsoup
15,730
Solution 1
If I understood your question. That's the python code which should work. Iterating to find all tables with the class="theclass", then finding links inside.
>>> foo = """<a href="www.example.com/"></a>
... <table class="theclass">
... <tr><td>
... <a href="www.example.com/two">two</a>
... </td></tr>
... <tr><td>
... <a href ="www.example.com/three">three</a>
... <span>blabla<span>
... </td></td>
... </table>
... """
>>> import BeautifulSoup as bs
>>> soup = bs.BeautifulSoup(foo)
>>> for table in soup.findAll('table', {'class':'theclass'} ):
... links=table.findAll('a')
...
>>> print links
[<a href="www.example.com/two">two</a>, <a href="www.example.com/three">three</a>]
Solution 2
infoText is a list. You should iterate over it.
>>>for info in infoText:
>>> print info.tr.td.a
<a href="www.example.com/two">two</a>
Then you can access the <table>
element. If you are just expecting one table element with a class "theclass" in your document, soup.find("table", {"class": "the class"})
would give you the table directly.
Comments
-
Julio Diaz almost 2 years
Given
<a href="www.example.com/"></a> <table class="theclass"> <tr><td> <a href="www.example.com/two">two</a> </td></tr> <tr><td> <a href ="www.example.com/three">three</a> <span>blabla<span> </td></td> </table>
How can I scrape only the that is inside table class="the class"? I tried using
soup = util.mysoupopen(theexample) infoText = soup.findAll("table", {"class": "the class"})
but I did not know how to further define the finding statement. Something else I tried, was turning the result of findAll() into an array. Then looking for patterns of when the needle would show up, but I couldnt find a consistent pattern. Thanks
-
Julio Diaz about 13 yearsI got this error, and I have no clue why that is.
Traceback (most recent call last): File "test.py", line 10, in <module> print info.tr.td.a File "/nfs/home/j/d/jdiaz/cs171/BeautifulSoup.py", line 402, in __getattr__ raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__.__name__, attr) AttributeError: 'NavigableString' object has no attribute 'tr'