BS4: Getting text in tag
55,070
Solution 1
One option would be to get the first element from the contents
of the a
element:
>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
s.r.o.,
Another one would be to find the small
tag and get the previous sibling:
>>> print soup.find('small').previous_sibling
s.r.o.,
Well, there are all sorts of alternative/crazy options also:
>>> print next(soup.find('a').descendants)
s.r.o.,
>>> print next(iter(soup.find('a')))
s.r.o.,
Solution 2
Use .children
soup.find('a').children.next()
s.r.o.,
Related videos on Youtube
Author by
Milano
Updated on July 29, 2022Comments
-
Milano almost 2 years
I'm using beautiful soup. There is a tag like this:
<li><a href="example"> s.r.o., <small>small</small></a></li>
I want to get the text within the anchor
<a>
tag only, without any from the<small>
tag in the output; i.e. "s.r.o.,
"I tried
find('li').text[0]
but it does not work.Is there a command in BS4 which can do that?
-
Milano over 9 yearsThanks, but as far as I know, split() without argument use ' ' as separator which is in this very case useful but there are cases when spaces and commas are contained in this text so it won't works. Or am I wrong?
-
Padraic Cunningham over 9 yearsYou are right I will have a look when I get back on my comp in a bit