How to convert UTF8 string into HTML string in python 2.5 for correct accent displaying?

15,021

Solution 1

First you should make sure value is of type unicode and not a string

value.encode('ascii', 'xmlcharrefreplace')

Should get you the HTML enitites

Python Unicode Documentation

>>> value = u"roulement \u00e0 billes"
>>> print value
roulement à billes
>>> print value.encode('ascii', 'xmlcharrefreplace')
roulement à billes
>>>

Solution 2

To embed unicode string literals in your code:

a) Make sure your source file is in UTF-8 (and add the # -*- coding line), then use the literals directly:

u'Zażółć gęślą jaźń'

b) Escape them in unicode literals:

u"roulement \u00e0 billes"

In both cases you need to use the unicode type, not str type, so prefix your literals with u.

>>> type("kos")
<type 'str'>
>>> type(u"kos")
<type 'unicode'>

how to convert any of such string into HTML entities, such as value="roulement &agrave billes" in order to display correctly as roulement à billes with a browser.

You shouldn't need to do this, except those that interfer with HTML itself, like < or > and a couple more.

Just encode your HTML file as UTF-8 and make sure that the browser will pick the encoding up (the response content type is cool, you can also drop in <meta charset="UTF-8"> or <meta http-equiv="content-type" content="text/html; charset=UTF-8"> inside <head>. The regional characters should be understood by browsers easily.

Share:
15,021
user1459604
Author by

user1459604

Updated on July 15, 2022

Comments

  • user1459604
    user1459604 almost 2 years

    My string UFT8, coming from a database (CSV file encoded in UTF8) is displayed like this on a browser with my main.py code: value ="roulement \u00e0 billes"

    => how to convert any of such string into HTML entities, such as value="roulement &agrave billes" in order to display correctly as roulement à billes with a browser.

    I tried to add:

     # -*- coding: utf-8 -*-
    

    on the 1st line of my file , and also :

     self.response.headers['Content-Type'] = 'text/html;charset=UTF-8'
    

    but it doesn't change anything

    => so, may be another way is to translate it into html entities ? how to ?

    Thank you.

  • user1459604
    user1459604 almost 12 years
    Thanks a lot ! I added to my code: value = unicode(value) value = value.encode('ascii','xmlcharrefreplace') and it works perfectly. Many thanks ! Philippe
  • user1459604
    user1459604 almost 12 years
    Thanks for your help ! Philippe
  • StefanE
    StefanE almost 12 years
    Glad it helped.. Don't forget to check the answer checkbox, will help others with similar problems if it's marked as solved!