AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'

14,023

You're trying to convert a str to bytes, and then store those bytes in a dictionary. The problem is that the object you're doing this to is an xml.etree.ElementTree.Element, not a str.

You probably meant to get the text from within or around that element, and then encode() that. The docs suggests using the itertext() method:

''.join(child.itertext())

This will evaluate to a str, which you can then encode().

Note that the text and tail attributes might not contain text (emphasis added):

Their values are usually strings but may be any application-specific object.

If you want to use those attributes, you'll have to handle None or non-string values:

head = '' if child.text is None else str(child.text)
tail = '' if child.text is None else str(child.text)
# Do something with head and tail...

Even this is not really enough. If text or tail contain bytes objects of some unexpected (or plain wrong) encoding, this will raise a UnicodeEncodeError.

Strings versus Bytes

I suggest leaving the text as a str, and not encoding it at all. Encoding text to a bytes object is intended as the last step before writing it to a binary file, a network socket, or some other hardware.

For more on the difference between bytes and characters, see Ned Batchelder's "Pragmatic Unicode, or, How Do I Stop the Pain?" (36 minute video from PyCon US 2012). He covers both Python 2 and 3.

Example Output

Using the child.itertext() method, and not encoding the strings, I got a reasonable-looking list-of-dictionaries from topStories():

[
  ...,
  {'description': 'Ayushmann Khurrana says his five-year Bollywood journey has '
                  'been “a fun ride”; adds success is a lousy teacher while '
                  'failure is “your friend, philosopher and guide”.',
    'guid': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
    'link': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
    'media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG',
    'pubDate': 'Mon, 26 Jun 2017 10:50:26 GMT ',
    'title': "I am a hardcore realist, and that's why I feel my journey "
             'has been a joyride: Ayushmann...'},
]
Share:
14,023
ani
Author by

ani

Updated on June 23, 2022

Comments

  • ani
    ani almost 2 years

    I'm trying to make a desktop notifier, and for that I'm scraping news from a site. When I run the program, I get the following error.

    news[child.tag] = child.encode('utf8')
    AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'
    

    How do I resolve it? I'm completely new to this. I tried searching for solutions, but none of them worked for me.

    Here is my code:

    import requests
    import xml.etree.ElementTree as ET
    
    
    # url of news rss feed
    RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"
    
    
    def loadRSS():
        '''
        utility function to load RSS feed
        '''
        # create HTTP request response object
        resp = requests.get(RSS_FEED_URL)
        # return response content
        return resp.content
    
    
    def parseXML(rss):
        '''
        utility function to parse XML format rss feed
        '''
        # create element tree root object
        root = ET.fromstring(rss)
        # create empty list for news items
        newsitems = []
        # iterate news items
        for item in root.findall('./channel/item'):
            news = {}
            # iterate child elements of item
            for child in item:
                # special checking for namespace object content:media
                if child.tag == '{http://search.yahoo.com/mrss/}content':
                    news['media'] = child.attrib['url']
                else:
                    news[child.tag] = child.encode('utf8')
            newsitems.append(news)
        # return news items list
        return newsitems
    
    
    def topStories():
        '''
        main function to generate and return news items
        '''
        # load rss feed
        rss = loadRSS()
        # parse XML
        newsitems = parseXML(rss)
        return newsitems
    
  • ani
    ani almost 7 years
    when i write just child.text then i get the following error in my notification program message.append(signature=signature, *args) TypeError: Expected a string or unicode object
  • Kevin J. Chase
    Kevin J. Chase almost 7 years
    @ani: I don't see message.append in your code anywhere. Anyway, like I highlighted in my answer, text and tail can contain any object, so don't assume they're text in any form, let alone Unicode strings. (If you did get str strings and then used the encode() method to convert them to bytes, see my update about leaving them as strings.)
  • ani
    ani almost 7 years
    No i'm using message.append in another program
  • ani
    ani almost 7 years
    when i'm using str(child.text) i'm getting the following error UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 136: ordinal not in range(128)
  • Kevin J. Chase
    Kevin J. Chase almost 7 years
    @ani: Remember the bit that said "may be any application-specific object"? That includes "text" in unspecified (and possibly incorrect) encodings. Are you sure you can't get what you want from itertext()? I added an example of the output I got from it.
  • ani
    ani almost 7 years
    Sorry,i did not see itertext().It's working now.Thank you so much