Can I remove script tags with BeautifulSoup?

118,403

Solution 1

from bs4 import BeautifulSoup
soup = BeautifulSoup('<script>a</script>baba<script>b</script>', 'html.parser')
for s in soup.select('script'):
    s.extract()
print(soup)
baba

Solution 2

Updated answer for those who might need for future reference: The correct answer is. decompose(). You can use different ways but decompose works in place.

Example usage:

soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>')
soup.i.decompose()
print str(soup)
#prints '<p>This is a slimy text and</p>'

Pretty useful to get rid of detritus like <script>, <img> and so forth.

Solution 3

As stated in the (official documentation) you can use the extract method to remove all the subtree that matches the search.

import BeautifulSoup
a = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>")
[x.extract() for x in a.findAll('script')]
Share:
118,403

Related videos on Youtube

Sam
Author by

Sam

Updated on July 08, 2022

Comments

  • Sam
    Sam over 1 year

    Can <script> tags and all of their contents be removed from HTML with BeautifulSoup, or do I have to use Regular Expressions or something else?

  • Ila
    Ila about 11 years
    What's the best way to chain on additional tags to be removed? Right now it works if I repeat the command one after another, with [s.extract() for s in soup('script')] then [s.extract() for s in soup('iframe')] and so on, but not if I chain them like so [s.extract() for s in soup('iframe', 'script')].
  • Fábio Diniz
    Fábio Diniz about 11 years
    @Ali You would have to use [s.extract() for s in soup(['iframe', 'script'])] Note that to use multiple tags, the parameter must be a list
  • user2883071
    user2883071 over 8 years
    @FábioDiniz How would I extract something like: '<script class="blah">a</script>baba<script id="blahhhh">b</script>'? Is it the same?
  • QuangDT
    QuangDT over 8 years
    To get the final string with the elements removed in code, call str(soup)
  • imrek
    imrek over 7 years
    The soup object becomes useless after this operation, no tags are found anymore.
  • Mike
    Mike almost 7 years
    The difference between decompose and extract is that the latter returns the thing that was removed, whereas the former just destroys it. So this is the more precise answer to the question, but the other methods do work.
  • Menachem Hornbacher
    Menachem Hornbacher almost 7 years
    Sorry for my ignorance can you please explain what putting the code in a list does?
  • Jacquelyn.Marquardt
    Jacquelyn.Marquardt almost 7 years
    @FábioDiniz What if I wanted to do the opposite? remove ALL tags except for the <img> tag? Thanks
  • Roland Pihlakas
    Roland Pihlakas over 6 years
    Decompose does not remove the content of script tags, it only removes the tags.
  • Abhishek Dujari
    Abhishek Dujari over 6 years
    I agree with both your comments. Which is why I said correct answer as per OP which was to remove contents. Often used for cleaning HTML of unneeded tags and formatting.
  • jarcobi889
    jarcobi889 over 6 years
    Actually, according to the documentation: "Tag.decompose() removes a tag from the tree, then completely destroys it and its contents:" crummy.com/software/BeautifulSoup/bs4/doc/#decompose
  • Cybersupernova
    Cybersupernova almost 6 years
    It works but will fail if there is no <i> in the HTML. When you are not sure about the HTML structure then extract is better
  • Abhishek Dujari
    Abhishek Dujari almost 6 years
    If you are not sure about the HTML you can't use strict mode and yes then falling back to extract might be the only way.
  • jarcobi889
    jarcobi889 almost 6 years
    @Vangel Apologies, I think I forgot to add a mention in my comment: I believe I was responding to Roland Pihlakas with that comment.
  • SivolcC
    SivolcC almost 4 years
    This is outdated, BeautifulSoup seems to format the string to html now : <html><head></head><body><p>baba</p></body></html>
  • 0range
    0range almost 4 years
    Taking into account that we may have several i tags and want to remove all of them, we can (analogously to @FábioDiniz extract example above) do [s.decompose() for s in soup('i')]. decompose() by itself only removes the first occurrence.
  • Sundeep Pidugu
    Sundeep Pidugu over 3 years
    I was trying to add the element tag(Original variable) to a new variable and then apply the remove operation on the new variable and it even affects the original variable as well, how can this be fixed? what is the approach to do the same?
  • Sundeep Pidugu
    Sundeep Pidugu over 3 years
    @Orange Iam also trying to do the same, do you have a solution for that? ( to remove multiple occurrences of the tag)
  • mulaixi
    mulaixi about 3 years
    @FábioDiniz is there way to remove a tag with a specific class? I don't want to remove all tags with same name, but just one tag with a specific class
  • mulaixi
    mulaixi about 3 years
    Is there way to remove a tag with a specific class? I don't want to remove all tags with same name, but just one tag block with a specific class.
  • Edvard Rejthar
    Edvard Rejthar about 3 years
    All you have to do is to select specific elements to call extract to. [x.extract() for x in a.select('span.className')]
  • Raj
    Raj about 2 years
    @SundeepPidugu To remove tag with multiple occurrence you can use - [soup.i.decompose() for tag in soup.find_all('i')]