How to convert UTF8 string into HTML string in python 2.5 for correct accent displaying?
Solution 1
First you should make sure value
is of type unicode and not a string
value.encode('ascii', 'xmlcharrefreplace')
Should get you the HTML enitites
>>> value = u"roulement \u00e0 billes"
>>> print value
roulement à billes
>>> print value.encode('ascii', 'xmlcharrefreplace')
roulement à billes
>>>
Solution 2
To embed unicode string literals in your code:
a) Make sure your source file is in UTF-8 (and add the # -*- coding
line), then use the literals directly:
u'Zażółć gęślą jaźń'
b) Escape them in unicode literals:
u"roulement \u00e0 billes"
In both cases you need to use the unicode
type, not str
type, so prefix your literals with u
.
>>> type("kos")
<type 'str'>
>>> type(u"kos")
<type 'unicode'>
how to convert any of such string into HTML entities, such as value="roulement à billes" in order to display correctly as roulement à billes with a browser.
You shouldn't need to do this, except those that interfer with HTML itself, like <
or >
and a couple more.
Just encode your HTML file as UTF-8 and make sure that the browser will pick the encoding up (the response content type is cool, you can also drop in <meta charset="UTF-8">
or <meta http-equiv="content-type" content="text/html; charset=UTF-8">
inside <head>
. The regional characters should be understood by browsers easily.
user1459604
Updated on July 15, 2022Comments
-
user1459604 almost 2 years
My string UFT8, coming from a database (CSV file encoded in UTF8) is displayed like this on a browser with my main.py code:
value ="roulement \u00e0 billes"
=> how to convert any of such string into HTML entities, such as value="roulement à billes" in order to display correctly as
roulement à billes
with a browser.I tried to add:
# -*- coding: utf-8 -*-
on the 1st line of my file , and also :
self.response.headers['Content-Type'] = 'text/html;charset=UTF-8'
but it doesn't change anything
=> so, may be another way is to translate it into html entities ? how to ?
Thank you.
-
user1459604 almost 12 yearsThanks a lot ! I added to my code: value = unicode(value) value = value.encode('ascii','xmlcharrefreplace') and it works perfectly. Many thanks ! Philippe
-
user1459604 almost 12 yearsThanks for your help ! Philippe
-
StefanE almost 12 yearsGlad it helped.. Don't forget to check the answer checkbox, will help others with similar problems if it's marked as solved!