Can I remove script tags with BeautifulSoup?
118,403
Solution 1
from bs4 import BeautifulSoup
soup = BeautifulSoup('<script>a</script>baba<script>b</script>', 'html.parser')
for s in soup.select('script'):
s.extract()
print(soup)
baba
Solution 2
Updated answer for those who might need for future reference:
The correct answer is.
decompose()
.
You can use different ways but decompose
works in place.
Example usage:
soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>')
soup.i.decompose()
print str(soup)
#prints '<p>This is a slimy text and</p>'
Pretty useful to get rid of detritus like <script>
, <img>
and so forth.
Solution 3
As stated in the (official documentation) you can use the extract
method to remove all the subtree that matches the search.
import BeautifulSoup
a = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>")
[x.extract() for x in a.findAll('script')]
Related videos on Youtube
Author by
Sam
Updated on July 08, 2022Comments
-
Sam over 1 year
Can
<script>
tags and all of their contents be removed from HTML with BeautifulSoup, or do I have to use Regular Expressions or something else? -
Ila about 11 yearsWhat's the best way to chain on additional tags to be removed? Right now it works if I repeat the command one after another, with [s.extract() for s in soup('script')] then [s.extract() for s in soup('iframe')] and so on, but not if I chain them like so [s.extract() for s in soup('iframe', 'script')].
-
Fábio Diniz about 11 years@Ali You would have to use
[s.extract() for s in soup(['iframe', 'script'])]
Note that to use multiple tags, the parameter must be a list -
user2883071 over 8 years@FábioDiniz How would I extract something like:
'<script class="blah">a</script>baba<script id="blahhhh">b</script>'
? Is it the same? -
QuangDT over 8 yearsTo get the final string with the elements removed in code, call str(soup)
-
imrek over 7 yearsThe soup object becomes useless after this operation, no tags are found anymore.
-
Mike almost 7 yearsThe difference between
decompose
andextract
is that the latter returns the thing that was removed, whereas the former just destroys it. So this is the more precise answer to the question, but the other methods do work. -
Menachem Hornbacher almost 7 yearsSorry for my ignorance can you please explain what putting the code in a list does?
-
Jacquelyn.Marquardt almost 7 years@FábioDiniz What if I wanted to do the opposite? remove ALL tags except for the <img> tag? Thanks
-
Roland Pihlakas over 6 yearsDecompose does not remove the content of script tags, it only removes the tags.
-
Abhishek Dujari over 6 yearsI agree with both your comments. Which is why I said correct answer as per OP which was to
remove
contents. Often used for cleaning HTML of unneeded tags and formatting. -
jarcobi889 over 6 yearsActually, according to the documentation: "Tag.decompose() removes a tag from the tree, then completely destroys it and its contents:" crummy.com/software/BeautifulSoup/bs4/doc/#decompose
-
Cybersupernova almost 6 yearsIt works but will fail if there is no
<i>
in the HTML. When you are not sure about the HTML structure thenextract
is better -
Abhishek Dujari almost 6 yearsIf you are not sure about the HTML you can't use
strict
mode and yes then falling back to extract might be the only way. -
jarcobi889 almost 6 years@Vangel Apologies, I think I forgot to add a mention in my comment: I believe I was responding to Roland Pihlakas with that comment.
-
SivolcC almost 4 yearsThis is outdated, BeautifulSoup seems to format the string to html now :
<html><head></head><body><p>baba</p></body></html>
-
0range almost 4 yearsTaking into account that we may have several
i
tags and want to remove all of them, we can (analogously to @FábioDinizextract
example above) do[s.decompose() for s in soup('i')]
.decompose()
by itself only removes the first occurrence. -
Sundeep Pidugu over 3 yearsI was trying to add the element tag(Original variable) to a new variable and then apply the remove operation on the new variable and it even affects the original variable as well, how can this be fixed? what is the approach to do the same?
-
Sundeep Pidugu over 3 years@Orange Iam also trying to do the same, do you have a solution for that? ( to remove multiple occurrences of the tag)
-
mulaixi about 3 years@FábioDiniz is there way to remove a tag with a specific class? I don't want to remove all tags with same name, but just one tag with a specific class
-
mulaixi about 3 yearsIs there way to remove a tag with a specific class? I don't want to remove all tags with same name, but just one tag block with a specific class.
-
Edvard Rejthar about 3 yearsAll you have to do is to select specific elements to call
extract
to.[x.extract() for x in a.select('span.className')]
-
Raj about 2 years@SundeepPidugu To remove tag with multiple occurrence you can use - [soup.i.decompose() for tag in soup.find_all('i')]