AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'
You're trying to convert a str
to bytes
, and then store those bytes in a dictionary.
The problem is that the object you're doing this to is an
xml.etree.ElementTree.Element
,
not a str
.
You probably meant to get the text from within or around that element, and then encode()
that.
The docs
suggests using the
itertext()
method:
''.join(child.itertext())
This will evaluate to a str
, which you can then encode()
.
Note that the
text
and tail
attributes
might not contain text
(emphasis added):
Their values are usually strings but may be any application-specific object.
If you want to use those attributes, you'll have to handle None
or non-string values:
head = '' if child.text is None else str(child.text)
tail = '' if child.text is None else str(child.text)
# Do something with head and tail...
Even this is not really enough.
If text
or tail
contain bytes
objects of some unexpected
(or plain wrong)
encoding, this will raise a UnicodeEncodeError
.
Strings versus Bytes
I suggest leaving the text as a str
, and not encoding it at all.
Encoding text to a bytes
object is intended as the last step before writing it to a binary file, a network socket, or some other hardware.
For more on the difference between bytes and characters, see Ned Batchelder's "Pragmatic Unicode, or, How Do I Stop the Pain?" (36 minute video from PyCon US 2012). He covers both Python 2 and 3.
Example Output
Using the child.itertext()
method, and not encoding the strings, I got a reasonable-looking list-of-dictionaries from topStories()
:
[
...,
{'description': 'Ayushmann Khurrana says his five-year Bollywood journey has '
'been “a fun ride”; adds success is a lousy teacher while '
'failure is “your friend, philosopher and guide”.',
'guid': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
'link': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
'media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG',
'pubDate': 'Mon, 26 Jun 2017 10:50:26 GMT ',
'title': "I am a hardcore realist, and that's why I feel my journey "
'has been a joyride: Ayushmann...'},
]
ani
Updated on June 23, 2022Comments
-
ani almost 2 years
I'm trying to make a desktop notifier, and for that I'm scraping news from a site. When I run the program, I get the following error.
news[child.tag] = child.encode('utf8') AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'
How do I resolve it? I'm completely new to this. I tried searching for solutions, but none of them worked for me.
Here is my code:
import requests import xml.etree.ElementTree as ET # url of news rss feed RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml" def loadRSS(): ''' utility function to load RSS feed ''' # create HTTP request response object resp = requests.get(RSS_FEED_URL) # return response content return resp.content def parseXML(rss): ''' utility function to parse XML format rss feed ''' # create element tree root object root = ET.fromstring(rss) # create empty list for news items newsitems = [] # iterate news items for item in root.findall('./channel/item'): news = {} # iterate child elements of item for child in item: # special checking for namespace object content:media if child.tag == '{http://search.yahoo.com/mrss/}content': news['media'] = child.attrib['url'] else: news[child.tag] = child.encode('utf8') newsitems.append(news) # return news items list return newsitems def topStories(): ''' main function to generate and return news items ''' # load rss feed rss = loadRSS() # parse XML newsitems = parseXML(rss) return newsitems
-
ani almost 7 yearswhen i write just child.text then i get the following error in my notification program message.append(signature=signature, *args) TypeError: Expected a string or unicode object
-
Kevin J. Chase almost 7 years@ani: I don't see
message.append
in your code anywhere. Anyway, like I highlighted in my answer,text
andtail
can contain any object, so don't assume they're text in any form, let alone Unicode strings. (If you did getstr
strings and then used theencode()
method to convert them tobytes
, see my update about leaving them as strings.) -
ani almost 7 yearsNo i'm using message.append in another program
-
ani almost 7 yearswhen i'm using str(child.text) i'm getting the following error UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 136: ordinal not in range(128)
-
Kevin J. Chase almost 7 years@ani: Remember the bit that said "may be any application-specific object"? That includes "text" in unspecified (and possibly incorrect) encodings. Are you sure you can't get what you want from
itertext()
? I added an example of the output I got from it. -
ani almost 7 yearsSorry,i did not see itertext().It's working now.Thank you so much