How to remove all a href tags from text
10,425
Solution 1
Use del a['href']
instead, just like you would on a plain dictionary:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
del a['href']
gives you:
>>> print str(soup)
<p>Hello <a>Google</a></p>
UPDATE:
If you want to get rid of the <a>
tags altogether, you can use the .replaceWithChildren()
method:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
a.replaceWithChildren()
gives you:
>>> print str(soup)
<p>Hello Google</p>
...and, what you requested in the comment (wrap the text content of the tag with spaces), can be achieved with:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
del a['href']
a.setString(' %s ' % a.text)
gives you:
>>> print str(soup)
<p>Hello <a> Google </a></p>
Solution 2
You can use bleach
pip install bleach
then use it like this...
import bleach
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<a href = "somesite.com">hello world</a>')
clean = bleach.clean(soup,tags[],strip=True)
This results in...
>>> print clean
u'hello world'
here are the docs for bleach.
Author by
user2784753
Updated on June 18, 2022Comments
-
user2784753 almost 2 years
I have a script to replace a word in a "ahref" tag. However i want to remove the a href entirely, so that you have the word Google without a link.
from BeautifulSoup import BeautifulSoup soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>') for a in soup.findAll('a'): a['href'] = a['href'].replace("google", "mysite") result = str(soup)
Also can you find all the words placed in a href and place a " " before and after them. I'm not sure how to. I guess this is done before the replacing.
-
user2784753 over 10 yearsThanks but will the Google i see be a link or normal text. Also how can i place a space before the google or any word in a href. Thanks
-
Erik Kaplun over 10 years1) I don't know from the back of my head how browsers render
<a>
tags withouthref
—why not just try and see for yourself instead of having me do the checking? 2) I'm not sure what you're asking. -
user2784753 over 10 yearsyup checked no links. Once i get all the "<a>" i want to place a " " right after the <a> and befor the </a>. So <a href ="somthing">Hello</a>. Should become, <a> Hello </a>
-
Erik Kaplun over 10 yearsUpdated my answer accordingly.
-
PatrickT about 4 yearsAFAIK,
find_all
is now preferred overfindAll
(the latter available for backward compatibility).