BS4: Getting text in tag

55,070

Solution 1

One option would be to get the first element from the contents of the a element:

>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
 s.r.o., 

Another one would be to find the small tag and get the previous sibling:

>>> print soup.find('small').previous_sibling
 s.r.o., 

Well, there are all sorts of alternative/crazy options also:

>>> print next(soup.find('a').descendants)
 s.r.o., 
>>> print next(iter(soup.find('a')))
 s.r.o., 

Solution 2

Use .children

soup.find('a').children.next()
s.r.o.,
Share:
55,070

Related videos on Youtube

Milano
Author by

Milano

Updated on July 29, 2022

Comments

  • Milano
    Milano almost 2 years

    I'm using beautiful soup. There is a tag like this:

    <li><a href="example"> s.r.o., <small>small</small></a></li>

    I want to get the text within the anchor <a> tag only, without any from the <small> tag in the output; i.e. " s.r.o., "

    I tried find('li').text[0] but it does not work.

    Is there a command in BS4 which can do that?

  • Milano
    Milano over 9 years
    Thanks, but as far as I know, split() without argument use ' ' as separator which is in this very case useful but there are cases when spaces and commas are contained in this text so it won't works. Or am I wrong?
  • Padraic Cunningham
    Padraic Cunningham over 9 years
    You are right I will have a look when I get back on my comp in a bit