Get contents of div by id with BeautifulSoup

33,237

Solution 1

Join the elements of div tag's .contents:

from bs4 import BeautifulSoup

data = """
<div id='theDiv'>
    <p>div content</p>
    <p>div stuff</p>
    <p>div thing</p>
</div>
"""

soup = BeautifulSoup(data)
div = soup.find('div', id='theDiv')
print ''.join(map(str, div.contents))

Prints:

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

Solution 2

Since version 4.0.1 there's a function decode_contents():

>>> soup = BeautifulSoup("""
<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
""")

>>> print(soup.div.decode_contents())

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

More details in a solution to this question: https://stackoverflow.com/a/18602241/237105

Share:
33,237
user8028
Author by

user8028

Updated on July 10, 2020

Comments

  • user8028
    user8028 almost 4 years

    I am using python2.7.6, urllib2, and BeautifulSoup

    to extract html from a website and store in a variable.

    How can I show just the html contents of a div with an id by using beautifulsoup?

    <div id='theDiv'>
    <p>div content</p>
    <p>div stuff</p>
    <p>div thing</p>
    

    would be

    <p>div content</p>
    <p>div stuff</p>
    <p>div thing</p>