python jsonify dictionary in utf-8

40,496

Solution 1

Use the standard-library json module instead, and set the ensure_ascii keyword parameter to False when encoding, or do the same with flask.json.dumps():

>>> data = u'\u10e2\u10d4\u10e1\u10e2'
>>> import json
>>> json.dumps(data)
'"\\u10e2\\u10d4\\u10e1\\u10e2"'
>>> json.dumps(data, ensure_ascii=False)
u'"\u10e2\u10d4\u10e1\u10e2"'
>>> print json.dumps(data, ensure_ascii=False)
"ტესტ"
>>> json.dumps(data, ensure_ascii=False).encode('utf8')
'"\xe1\x83\xa2\xe1\x83\x94\xe1\x83\xa1\xe1\x83\xa2"'

Note that you still need to explicitly encode the result to UTF8 because the dumps() function returns a unicode object in that case.

You can make this the default (and use jsonify() again) by setting JSON_AS_ASCII to False in your Flask app config.

WARNING: do not include untrusted data in JSON that is not ASCII-safe, and then interpolate into a HTML template or use in a JSONP API, as you can cause syntax errors or open a cross-site scripting vulnerability this way. That's because JSON is not a strict subset of Javascript, and when disabling ASCII-safe encoding the U+2028 and U+2029 separators will not be escaped to \u2028 and \u2029 sequences.

Solution 2

Use the following config to add UTF-8 support:

app.config['JSON_AS_ASCII'] = False

Solution 3

If you still want to user flask's json and ensure the utf-8 encoding then you can do something like this:

from flask import json,Response
@app.route("/")
def hello():
    my_list = []
    my_list.append(u'ტესტ')
    data = { "result" : my_list}
    json_string = json.dumps(data,ensure_ascii = False)
    #creating a Response object to set the content type and the encoding
    response = Response(json_string,content_type="application/json; charset=utf-8" )
    return response

#I hope this helps

Share:
40,496

Related videos on Youtube

Beka Tomashvili
Author by

Beka Tomashvili

Software Crafter & Architect, Helping to get projects up and running Capable of programming in Python, Javascript, and Go. Familiar with the applications lifecycle on how to provide full software development processes from scratch to delivery.

Updated on July 09, 2022

Comments

  • Beka Tomashvili
    Beka Tomashvili almost 2 years

    I want to get json data into utf-8

    I have a list my_list = []

    and then many appends unicode values to the list like this

    my_list.append(u'ტესტ')
    
    return jsonify(result=my_list)
    

    and it gets

    {
    "result": [
    "\u10e2\u10d4\u10e1\u10e2",
    "\u10e2\u10dd\u10db\u10d0\u10e8\u10d5\u10d8\u10da\u10d8"
    ]
    }
    
    • Martijn Pieters
      Martijn Pieters about 11 years
      That's correct. Your data was encoded to JSON, with the unicode codepoints encoded to \uabcd escape points. What is the problem exactly? Because the encoding for the backslash characters, the u characters, etc. is the same in UTF8 and in ASCII, it may look confusing but it is legal JSON, and UTF8.
    • Martijn Pieters
      Martijn Pieters about 11 years
      I also edited your question; you have a list, not a dict.
    • Beka Tomashvili
      Beka Tomashvili about 11 years
      I want to get json as same values in utf-8 without unicode codepoints
    • Martijn Pieters
      Martijn Pieters about 11 years
      You may want to make that explicit in your question, and do add why you need that. To a compliant JSON decoder there is no difference, by using escapes you avoid all sorts of problems; from proxies that cannot handle wide character data very well to missing character encoding information leading to decoding errors.
  • oyilmaztekin
    oyilmaztekin over 7 years
    You saved me to change my jsonify function with the json.dumps(). thanks...
  • QtRoS
    QtRoS about 7 years
    Yeah, best solution if you also want to preserve indentation which jsonify adds, but json.dumps not.
  • phil294
    phil294 about 7 years
    the output is totally different in py3.5
  • Martijn Pieters
    Martijn Pieters about 7 years
    @Blauhirn: That's because there are material differences between Python 2 and Python 3. The OP here is using Python 2 (evident by the u string prefixes).
  • Martijn Pieters
    Martijn Pieters about 7 years
    @Blauhirn: in Python 2, string objects with non-ASCII characters are echoed with escapes, to ensure that the value remains ASCII safe. In Python 3 the rules have been relaxed and you now Unicode printable characters are left in as literals. Thus you'll see '"ტესტ"' even without using print(). Use print(ascii(....)) if you want to see the Python 2 style.
  • Martijn Pieters
    Martijn Pieters about 6 years
    Warning: JSON is not a Javascript subset, and disabling ASCII-safe encoding opens the door for issues with U+2028 and U+2029 separators in the data to break Javascript interpolation or JSONP APIs. This could leave you open to a cross-site scripting issue, depending on how the error is handled.
  • Kirill Malakhov
    Kirill Malakhov over 5 years
    The best solution.
  • mrblue
    mrblue over 3 years
    The best solution ever, after spent more than 1 hour to find a solution to handle non-Ascii characters. Thanks a lot.