Google App Engine: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48: ordinal not in range(128)

45,366

Python is likely trying to decode a unicode string into a normal str with the ascii codec and is failing. When you're working with unicode data you need to decode it:

content = content.decode('utf-8')
Share:
45,366
Manas Chaturvedi
Author by

Manas Chaturvedi

Software Engineer 2 at Haptik

Updated on August 08, 2022

Comments

  • Manas Chaturvedi
    Manas Chaturvedi almost 2 years

    I'm working on a small application using Google App Engine which makes use of the Quora RSS feed. There is a form, and based on the input entered by the user, it will output a list of links related to the input. Now, the applications works fine for one letter queries and most of two-letter words if the words are separated by a '-'. However, for three-letter words and some two-letter words, I get the following error:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48: ordinal not in range(128)

    Here's my Python code:

    import os
    import webapp2
    import jinja2
    from google.appengine.ext import db
    import urllib2
    import re
    
    template_dir = os.path.join(os.path.dirname(__file__), 'templates')
    jinja_env = jinja2.Environment(loader = jinja2.FileSystemLoader(template_dir), autoescape=True)
    
    class Handler(webapp2.RequestHandler):
        def write(self, *a, **kw):
            self.response.out.write(*a, **kw)
        def render_str(self, template, **params):
            t = jinja_env.get_template(template)
            return t.render(params)
        def render(self, template, **kw):
            self.write(self.render_str(template, **kw))
    
    class MainPage(Handler):
        def get(self):
            self.render("formrss.html")
        def post(self):
            x = self.request.get("rssquery")
            url = "http://www.quora.com/" + x + "/rss"
            content = urllib2.urlopen(url).read()
            allTitles =  re.compile('<title>(.*?)</title>')
            allLinks = re.compile('<link>(.*?)</link>')
            list = re.findall(allTitles,content)
            linklist = re.findall(allLinks,content)
            self.render("frontrss.html", list = list, linklist = linklist)
    
    
    
    app = webapp2.WSGIApplication([('/', MainPage)], debug=True)
    

    Here's the html code:

    <h1>Quora Live Feed</h1><br><br><br>
    
    {% extends "rssbase.html" %}
    
    {% block content %}
        {% for e in range(1, 19) %}
            {{ (list[e]) }} <br>
            <a href="{{ linklist[e] }}">{{ linklist[e] }}</a>
            <br><br>
        {% endfor %}
    {% endblock %}