How to remove all a href tags from text

10,425

Solution 1

Use del a['href'] instead, just like you would on a plain dictionary:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
    del a['href']

gives you:

>>> print str(soup)
<p>Hello <a>Google</a></p>

UPDATE:

If you want to get rid of the <a> tags altogether, you can use the .replaceWithChildren() method:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
    a.replaceWithChildren()

gives you:

>>> print str(soup)
<p>Hello Google</p>

...and, what you requested in the comment (wrap the text content of the tag with spaces), can be achieved with:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
    del a['href']
    a.setString(' %s ' % a.text)

gives you:

>>> print str(soup)
<p>Hello <a> Google </a></p>

Solution 2

You can use bleach

pip install bleach

then use it like this...

import bleach
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<a href = "somesite.com">hello world</a>')
clean = bleach.clean(soup,tags[],strip=True)

This results in...

>>> print clean
u'hello world'

here are the docs for bleach.

Share:
10,425
user2784753
Author by

user2784753

Updated on June 18, 2022

Comments

  • user2784753
    user2784753 almost 2 years

    I have a script to replace a word in a "ahref" tag. However i want to remove the a href entirely, so that you have the word Google without a link.

    from BeautifulSoup import BeautifulSoup
    
    soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
    for a in soup.findAll('a'):
        a['href'] = a['href'].replace("google", "mysite")
    result = str(soup)
    

    Also can you find all the words placed in a href and place a " " before and after them. I'm not sure how to. I guess this is done before the replacing.

  • user2784753
    user2784753 over 10 years
    Thanks but will the Google i see be a link or normal text. Also how can i place a space before the google or any word in a href. Thanks
  • Erik Kaplun
    Erik Kaplun over 10 years
    1) I don't know from the back of my head how browsers render <a> tags without href—why not just try and see for yourself instead of having me do the checking? 2) I'm not sure what you're asking.
  • user2784753
    user2784753 over 10 years
    yup checked no links. Once i get all the "<a>" i want to place a " " right after the <a> and befor the </a>. So <a href ="somthing">Hello</a>. Should become, <a> Hello </a>
  • Erik Kaplun
    Erik Kaplun over 10 years
    Updated my answer accordingly.
  • PatrickT
    PatrickT about 4 years
    AFAIK, find_all is now preferred over findAll (the latter available for backward compatibility).