How to obtain title attribute using python and beautifulsoup?

11,564

Solution 1

To get an attribute of an element, you can treat an element as a dictionary (reference):

soup.find('tag_name')['attribute_name']

And, in your case:

for tr in soup.find_all('tr'):
    for td in tr.find_all('td'):
        print(td.get('title', 'No title attribute'))

Note that I've used .get() method to avoid failing on td elements with no title attribute.

Solution 2

The lxml library is often useful too, because it makes it possible to identify HTML structures using xpath expressions which can make for more compact codes.

In this case, the xpath expression //td[@title] asks for all td elements but insists that the title attribute be present. In the for-loop you see that there is no need to check for the presence of the attribute as this has already been done.

>>> from io import StringIO
>>> HTML = StringIO('''\
... <td title="title 1" role="gridcell"><a onclick="open" href="#">TEXT</a></td>
... <td role="gridcell"><a onclick="open" href="#">TEXT</a></td>
... <td title="title 2" role="gridcell"><a onclick="open" href="#">TEXT</a></td>
... <td title="title 3" role="gridcell"><a onclick="open" href="#">TEXT</a></td>''')
>>> parser = etree.HTMLParser()
>>> tree = etree.parse(HTML, parser)
>>> tds = tree.findall('//td[@title]')
>>> tds
[<Element td at 0x7a0888>, <Element td at 0x7a0d08>, <Element td at 0x7ae588>]
>>> for item in tree.findall('//td[@title]'):
...     item.attrib['title']
...     
'title 1'
'title 2'
'title 3'
Share:
11,564
vham
Author by

vham

Updated on June 04, 2022

Comments

  • vham
    vham almost 2 years

    Assume the following:

    <td title="I want this title" role="gridcell"><a onclick="open" href="#">TEXT</a></td>
    

    Now, I've successfully found respectively the table and individual rows using:

    for rows in soup.find_all(['tr']):
        for cells in rows.find_all(['td']):
    

    By printing cells I can see I've found the correct rows, but I'm really not sure how to take the title attribute and save it as a string? I've attempted to use temp = soup.find('td')['title'], but I'm getting errors doing this, so evidently I'm doing something wrong.

    Any suggestions would be much appreciated!