Python 3, beautiful soup, get next tag

python python-3.x beautifulsoup

12,946

I want to get all the href links in this document that are directly after the div tag with the class "product-list-item"

To find the first <a href> element in the <div>:

links = []
for div in soup.find_all('div', 'product-list-item'): 
    a = div.find('a', href=True) # find <a> anywhere in <div>
    if a is not None:
       links.append(a['href'])

It assumes that the link is inside <div>. Any elements in <div> before the first <a href> are ignored.

If you'd like; you can be more strict about it e.g., taking the link only if it is the first child in <div>:

a = div.contents[0] # take the very first child even if it is not a Tag
if a.name == 'a' and a.has_attr('href'):
   links.append(a['href'])

Or if <a> is not inside <div>:

a = div.find_next('a', href=True) # find <a> that appears after <div>
if a is not None:
   links.append(a['href'])

There are many ways to search and navigate in BeautifulSoup.

If you search with lxml.html, you could also use xpath and css expressions if you are familiar with them.

12,946

Author by

user136036

Updated on June 04, 2022

Comments

user136036 almost 2 years
I have the following html part which repeates itself several times with other href links:
```
<div class="product-list-item  margin-bottom">
<a title="titleexample" href="http://www.urlexample.com/example_1" data-style-id="sp_2866">
```
Now I want to get all the href links in this document that are directly after the div tag with the class "product-list-item". Pretty new to beautifulsoup and nothing that I came up with worked.

Thanks for your ideas.

EDIT: Does not really have to be beautifulsoup; when it can be done with regex and the python html parser this is also ok.

EDIT2: What I tried (I'm pretty new to python, so what I did might be totaly stupid from an advanced viewpoint):
```
soup = bs4.BeautifulSoup(htmlsource)
x = soup.find_all("div")
for i in range(len(x)):
    if x[i].get("class") and "product-list-item" in x[i].get("class"):
        print(x[i].get("class"))
```
This will give me a list of all the "product-list-item" but then I tried something like
```
print(x[i].get("class").next_element)
```
Because I thought next_element or next_sibling should give me the next tag but it just leads to AttributeError: 'list' object has no attribute 'next_element'. So I tried with only the first list element:
```
print(x[i][0].get("class").next_element)
```
Which led to this error: return self.attrs[key] KeyError: 0. Also tried with .find_all("href") and .get("href") but this all leads to the same errors.

EDIT3: Ok seems I found out how to solve it, now I did:
```
x = soup.find_all("div")

for i in range(len(x)):    
    if x[i].get("class") and "product-list-item" in x[i].get("class"):
        print(x[i].next_element.next_element.get("href"))
```
This can also be shortened by using another attribute to the find_all function:
```
x = soup.find_all("div", "product-list-item")
for i in x:
    print(i.next_element.next_element.get("href"))
```
greetings