Python beautiful soup select text
Solution 1
If you just want to check whether it was found or not, you could use a simple boolean flag as follow :
foo = []
found = False
for i, tag in enumerate(content):
if content[i].text == 'Example':
found = True
foo.append('Example')
break
else:
continue
if not found:
foo.append('Not Example')
If I get what you want, this may be a simple approach, though the solution of alecxe looks amazing.
Solution 2
You can use find_all()
to find all td
elements with class='style8'
and use list comprehension to construct the foo
list:
from bs4 import BeautifulSoup
html = """<html>
<body>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>"""
soup = BeautifulSoup(html)
foo = ["Example" if "Example" in node.text else "Not Present"
for node in soup.find_all('td', {'class': 'style8'})]
print foo
prints:
['Example', 'Not Present', 'Not Present', 'Not Present']
Ciaran
Updated on June 04, 2022Comments
-
Ciaran almost 2 years
The following is an example of the HTML code I want to parse:
<html> <body> <td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td> <td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td> <td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td> <td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td> </body> </html>
I am using beautiful soup to parse the HTML code by selecting style8 as follows (where html reads the result of my http request):
html = result.read() soup = BeautifulSoup(html) content = soup.select('.style8')
In this example, the
content
variable returns a list of 4 Tags. I want to check thecontent.text
, which contains the text of eachstyle8
class, for each item in the list if it containsExample
and appends that to a variable. If it proceeds through the entire list andExample
does not occur within the list, it then appendsNot present
to the variable.I have got the following so far:
foo = [] for i, tag in enumerate(content): if content[i].text == 'Example': foo.append('Example') break else: continue
This will only append
Example
tofoo
if it occurs, however it will not appendNot Present
if it does not occur within the entire list.Any method of doing so is appreciated, or better way of searching the entire results to check if a string is present would be great