Iterate through elements in html tree using BeautifulSoup, and produce an output that maintains the relative position of each element? in Python
28,974
To find all <div>
elements that have class
attribute from a given list:
#!/usr/bin/env python
from bs4 import BeautifulSoup # $ pip install beautifulsoup4
with open('input.xml', 'rb') as file:
soup = BeautifulSoup(file)
elements = soup.find_all("div", class_="header name quantity".split())
print("\n".join("{} {}".format(el['class'], el.get_text()) for el in elements))
Output
['header'] content
['name'] content
['quantity'] content
['name'] content
['quantity'] content
['header'] content2
['name'] content2
['quantity'] content2
['name'] content2
['quantity'] content2
There are also other methods that allows you to search, traverse html elements.
Author by
Christian
Updated on July 05, 2022Comments
-
Christian almost 2 years
I have this code that does what I need it to do using Jsoup in Java
Elements htmlTree = doc.body().select("*"); Elements menuElements = new Elements(); for(Element element : htmlTree) { if(element.hasClass("header")) menuElements.add(element); if(element.hasClass("name")) menuElements.add(element); if(element.hasClass("quantity")) menuElements.add(element); }
I want to do the same thing but in Python using BeautifulSoup. An example tree of the HTML I'm trying to scrape follows:
<div class="header"> content </div> <div class="name"> content </div> <div class="quantity"> content </div> <div class="name"> content </div> <div class="quantity"> content </div> <div class="header"> content2 </div> <div class="name"> content2 </div> <div class="quantity"> content2 </div> <div class="name"> content2 </div> <div class="quantity"> content2 </div>
etc.
Basically I want the output to preserve the relative positions of each element. How would I got about doing that using Python and BeautifulSoup?
EDIT:
This is the python code I have (it's very naive) but maybe it can help?
output = [] for e in soup : if e["class"] == "pickmenucolmenucat" : output.append(e) if e["class"] == "pickmenucoldispname" : output.append(e) if e["class"] == "pickmenucolportions" : output.append(e)