AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'

python xml python-requests

14,023

You're trying to convert a str to bytes, and then store those bytes in a dictionary. The problem is that the object you're doing this to is an xml.etree.ElementTree.Element, not a str.

You probably meant to get the text from within or around that element, and then encode() that. The docs suggests using the itertext() method:

''.join(child.itertext())

This will evaluate to a str, which you can then encode().

Note that the text and tail attributes might not contain text (emphasis added):

Their values are usually strings but may be any application-specific object.

If you want to use those attributes, you'll have to handle None or non-string values:

head = '' if child.text is None else str(child.text)
tail = '' if child.text is None else str(child.text)
# Do something with head and tail...

Even this is not really enough. If text or tail contain bytes objects of some unexpected (or plain wrong) encoding, this will raise a UnicodeEncodeError.

Strings versus Bytes

I suggest leaving the text as a str, and not encoding it at all. Encoding text to a bytes object is intended as the last step before writing it to a binary file, a network socket, or some other hardware.

For more on the difference between bytes and characters, see Ned Batchelder's "Pragmatic Unicode, or, How Do I Stop the Pain?" (36 minute video from PyCon US 2012). He covers both Python 2 and 3.

Example Output

Using the child.itertext() method, and not encoding the strings, I got a reasonable-looking list-of-dictionaries from topStories():

[
  ...,
  {'description': 'Ayushmann Khurrana says his five-year Bollywood journey has '
                  'been “a fun ride”; adds success is a lousy teacher while '
                  'failure is “your friend, philosopher and guide”.',
    'guid': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
    'link': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
    'media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG',
    'pubDate': 'Mon, 26 Jun 2017 10:50:26 GMT ',
    'title': "I am a hardcore realist, and that's why I&thinsp;feel my journey "
             'has been a joyride: Ayushmann...'},
]

14,023

Author by

ani

Updated on June 23, 2022

Comments

ani almost 2 years

I'm trying to make a desktop notifier, and for that I'm scraping news from a site. When I run the program, I get the following error.

news[child.tag] = child.encode('utf8')
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'

How do I resolve it? I'm completely new to this. I tried searching for solutions, but none of them worked for me.

Here is my code:

import requests
import xml.etree.ElementTree as ET


# url of news rss feed
RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"


def loadRSS():
    '''
    utility function to load RSS feed
    '''
    # create HTTP request response object
    resp = requests.get(RSS_FEED_URL)
    # return response content
    return resp.content


def parseXML(rss):
    '''
    utility function to parse XML format rss feed
    '''
    # create element tree root object
    root = ET.fromstring(rss)
    # create empty list for news items
    newsitems = []
    # iterate news items
    for item in root.findall('./channel/item'):
        news = {}
        # iterate child elements of item
        for child in item:
            # special checking for namespace object content:media
            if child.tag == '{http://search.yahoo.com/mrss/}content':
                news['media'] = child.attrib['url']
            else:
                news[child.tag] = child.encode('utf8')
        newsitems.append(news)
    # return news items list
    return newsitems


def topStories():
    '''
    main function to generate and return news items
    '''
    # load rss feed
    rss = loadRSS()
    # parse XML
    newsitems = parseXML(rss)
    return newsitems

ani almost 7 years

when i write just child.text then i get the following error in my notification program message.append(signature=signature, *args) TypeError: Expected a string or unicode object
Kevin J. Chase almost 7 years

@ani: I don't see message.append in your code anywhere. Anyway, like I highlighted in my answer, text and tail can contain any object, so don't assume they're text in any form, let alone Unicode strings. (If you did get str strings and then used the encode() method to convert them to bytes, see my update about leaving them as strings.)
ani almost 7 years

No i'm using message.append in another program
ani almost 7 years

when i'm using str(child.text) i'm getting the following error UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 136: ordinal not in range(128)
Kevin J. Chase almost 7 years

@ani: Remember the bit that said "may be any application-specific object"? That includes "text" in unspecified (and possibly incorrect) encodings. Are you sure you can't get what you want from itertext()? I added an example of the output I got from it.
ani almost 7 years

Sorry,i did not see itertext().It's working now.Thank you so much