python jsonify dictionary in utf-8
Solution 1
Use the standard-library json
module instead, and set the ensure_ascii
keyword parameter to False when encoding, or do the same with flask.json.dumps()
:
>>> data = u'\u10e2\u10d4\u10e1\u10e2'
>>> import json
>>> json.dumps(data)
'"\\u10e2\\u10d4\\u10e1\\u10e2"'
>>> json.dumps(data, ensure_ascii=False)
u'"\u10e2\u10d4\u10e1\u10e2"'
>>> print json.dumps(data, ensure_ascii=False)
"ტესტ"
>>> json.dumps(data, ensure_ascii=False).encode('utf8')
'"\xe1\x83\xa2\xe1\x83\x94\xe1\x83\xa1\xe1\x83\xa2"'
Note that you still need to explicitly encode the result to UTF8 because the dumps()
function returns a unicode
object in that case.
You can make this the default (and use jsonify()
again) by setting JSON_AS_ASCII
to False in your Flask app config.
WARNING: do not include untrusted data in JSON that is not ASCII-safe, and then interpolate into a HTML template or use in a JSONP API, as you can cause syntax errors or open a cross-site scripting vulnerability this way. That's because JSON is not a strict subset of Javascript, and when disabling ASCII-safe encoding the U+2028 and U+2029 separators will not be escaped to \u2028
and \u2029
sequences.
Solution 2
Use the following config to add UTF-8 support:
app.config['JSON_AS_ASCII'] = False
Solution 3
If you still want to user flask's json and ensure the utf-8 encoding then you can do something like this:
from flask import json,Response
@app.route("/")
def hello():
my_list = []
my_list.append(u'ტესტ')
data = { "result" : my_list}
json_string = json.dumps(data,ensure_ascii = False)
#creating a Response object to set the content type and the encoding
response = Response(json_string,content_type="application/json; charset=utf-8" )
return response
#I hope this helps
Related videos on Youtube
Beka Tomashvili
Software Crafter & Architect, Helping to get projects up and running Capable of programming in Python, Javascript, and Go. Familiar with the applications lifecycle on how to provide full software development processes from scratch to delivery.
Updated on July 09, 2022Comments
-
Beka Tomashvili almost 2 years
I want to get json data into utf-8
I have a list
my_list = []
and then many appends unicode values to the list like this
my_list.append(u'ტესტ') return jsonify(result=my_list)
and it gets
{ "result": [ "\u10e2\u10d4\u10e1\u10e2", "\u10e2\u10dd\u10db\u10d0\u10e8\u10d5\u10d8\u10da\u10d8" ] }
-
Martijn Pieters about 11 yearsThat's correct. Your data was encoded to JSON, with the unicode codepoints encoded to
\uabcd
escape points. What is the problem exactly? Because the encoding for the backslash characters, theu
characters, etc. is the same in UTF8 and in ASCII, it may look confusing but it is legal JSON, and UTF8. -
Martijn Pieters about 11 yearsI also edited your question; you have a
list
, not adict
. -
Beka Tomashvili about 11 yearsI want to get json as same values in utf-8 without unicode codepoints
-
Martijn Pieters about 11 yearsYou may want to make that explicit in your question, and do add why you need that. To a compliant JSON decoder there is no difference, by using escapes you avoid all sorts of problems; from proxies that cannot handle wide character data very well to missing character encoding information leading to decoding errors.
-
-
oyilmaztekin over 7 yearsYou saved me to change my jsonify function with the json.dumps(). thanks...
-
QtRoS about 7 yearsYeah, best solution if you also want to preserve indentation which jsonify adds, but json.dumps not.
-
phil294 about 7 yearsthe output is totally different in py3.5
-
Martijn Pieters about 7 years@Blauhirn: That's because there are material differences between Python 2 and Python 3. The OP here is using Python 2 (evident by the
u
string prefixes). -
Martijn Pieters about 7 years@Blauhirn: in Python 2, string objects with non-ASCII characters are echoed with escapes, to ensure that the value remains ASCII safe. In Python 3 the rules have been relaxed and you now Unicode printable characters are left in as literals. Thus you'll see
'"ტესტ"'
even without usingprint()
. Useprint(ascii(....))
if you want to see the Python 2 style. -
Martijn Pieters about 6 yearsWarning: JSON is not a Javascript subset, and disabling ASCII-safe encoding opens the door for issues with U+2028 and U+2029 separators in the data to break Javascript interpolation or JSONP APIs. This could leave you open to a cross-site scripting issue, depending on how the error is handled.
-
Kirill Malakhov over 5 yearsThe best solution.
-
mrblue over 3 yearsThe best solution ever, after spent more than 1 hour to find a solution to handle non-Ascii characters. Thanks a lot.