Let JSON object accept bytes or let urlopen output strings
Solution 1
HTTP sends bytes. If the resource in question is text, the character encoding is normally specified, either by the Content-Type HTTP header or by another mechanism (an RFC, HTML meta http-equiv
,...).
urllib
should know how to encode the bytes to a string, but it's too naïve—it's a horribly underpowered and un-Pythonic library.
Dive Into Python 3 provides an overview about the situation.
Your "work-around" is fine—although it feels wrong, it's the correct way to do it.
Solution 2
Python’s wonderful standard library to the rescue…
import codecs
reader = codecs.getreader("utf-8")
obj = json.load(reader(response))
Works with both py2 and py3.
Solution 3
I have come to opinion that the question is the best answer :)
import json
from urllib.request import urlopen
response = urlopen("site.com/api/foo/bar").read().decode('utf8')
obj = json.loads(response)
Solution 4
For anyone else trying to solve this using the requests
library:
import json
import requests
r = requests.get('http://localhost/index.json')
r.raise_for_status()
# works for Python2 and Python3
json.loads(r.content.decode('utf-8'))
Solution 5
This one works for me, I used 'request' library with json()
check out the doc in requests for humans
import requests
url = 'here goes your url'
obj = requests.get(url).json()
Related videos on Youtube
Peter Smit
Currently working as Doctoral Student in the Speech Group of the Department of Signal Processing and Acoustics of the Aalto Univerity School of Electrical Engineering (formerly TKK / Helsinki University of Technology) in Helsinki, Finland.
Updated on September 26, 2020Comments
-
Peter Smit over 3 years
With Python 3 I am requesting a json document from a URL.
response = urllib.request.urlopen(request)
The
response
object is a file-like object withread
andreadline
methods. Normally a JSON object can be created with a file opened in text mode.obj = json.load(fp)
What I would like to do is:
obj = json.load(response)
This however does not work as urlopen returns a file object in binary mode.
A work around is of course:
str_response = response.read().decode('utf-8') obj = json.loads(str_response)
but this feels bad...
Is there a better way that I can transform a bytes file object to a string file object? Or am I missing any parameters for either
urlopen
orjson.load
to give an encoding?-
Bob Yoplait about 7 yearsI think you have a typo there, "readall" should be "read" ?
-
CaptainNemo over 6 years@BobYoplait I agree.
-
-
ThatAintWorking about 10 yearsThis may be the "correct" way to do it but if there was one thing I could undo about Python 3 it would be this bytes/strings crap. You would think the built-in library functions would at least know how to deal with other built-in library functions. Part of the reason we use python is the simple intuitive syntax. This change breaks that all over the place.
-
offby1 over 9 yearsCheck out the "requests" library -- it handles this sort of thing for you automagically.
-
jbg over 9 yearsThis isn’t a case of the built-in library functions needing to “know how” to deal with other functions. JSON is defined as a UTF-8 representation of objects, so it can’t magically decode bytes that it doesn’t know the encoding of. I do agree that
urlopen
ought to be able to decode the bytes itself since it knows the encoding. Anyway, I’ve posted the Python standard library solution as an answer — you can do streaming decoding of bytes using thecodecs
module. -
Aaron Lelevier almost 9 yearsI got this error when trying this answer in
python 3.4.3
not sure why? The error wasTypeError: the JSON object must be str, not 'StreamReader'
-
sleepycal over 8 years@AronYsidoro Did you possibly use
json.loads()
instead ofjson.load()
? -
Phil Frost about 8 yearsFor bonus points, use the encoding specified in the response, instead of assuming utf-8:
response.headers.get_content_charset()
. ReturnsNone
if there is no encoding, and doesn't exist on python2. -
jbg about 8 years@PhilFrost That’s slick. In practice it might pay to be careful with that; JSON is always UTF-8, UTF-16 or UTF-32 by definition (and is overwhelmingly likely to be UTF-8), so if another encoding is returned by the web server, it’s possibly a misconfiguration of the web server software rather than genuinely non-standard JSON.
-
jfs about 8 years@jbg: json itself is a text format—it knows nothing about character encodings and bytes. Nothing stops you storing it on disk using any character encoding you like. Though RFCs for application/json media type say: "JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32." i.e., a web server must use only these encodings. Also, there is no
charset
parameter defined forapplication/json
and the recent rfc specify no way to detect the encoding. It makes utf-8 the only choice. -
jfs about 8 years@PhilFrost it exists on Python 2 as
response.headers.getparam('charset')
, see A good way to get the charset/encoding of an HTTP response in Python. Though as I said in the previous comment: It doesn't help with json. -
Harper Koo over 7 yearswhen I used in in python 3.5, the error was "AttributeError: 'bytes' object has no attribute 'read'"
-
jbg over 7 years@harperkoo: Did you possibly pass a
bytes
object as theresponse
variable instead of a file-like object? If you already have abytes
object and just want to decode it, you can simply call thedecode(encoding)
method on it. -
jbg over 7 yearsThis functionality is built-in to
requests
: you can simply dor.json()
-
sfblackl about 7 yearsI got to this page because I was having an issue with Flask unit tests - thanks for posting the single line call.
-
Blairg23 almost 7 yearsThe clarify, if you use @jbg's method, you don't need to do
json.loads
. All you have to do isr.json()
and you've got your JSON object loaded into a dict already. -
EvertW almost 7 years@ThatAintWorking: I would disagree. While it is a pain in the neck to explicitly have to manage the difference between bytes and strings, it is a much greater pain to have the language make some implicit conversion for you. Implicit bytes <-> string conversions are a source of many bugs, and Python3 is very helpful in pointing out the pitfalls. But I agree the library has room for improvement in this area.
-
ThatAintWorking almost 7 years@EvertW the failure, in my opinion, it forcing strings to be unicode in the first place.
-
EvertW almost 7 years@ThatAintWorking: No, strings must be Unicode, if you want software that can be used in other places than the UK or USA. For decades we have suffered under the myoptic worldview of the ASCII committee. Python3 finally got it right. Might have something to do with Python originating in Europe...
-
andilabs about 6 years`*** AttributeError: 'Response' object has no attribute 'readable'``
-
andilabs about 6 years*** AttributeError: 'bytes' object has no attribute 'readable'
-
andilabs about 6 years
*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 264-265: ordinal not in range(128)
-
Collin Anderson about 6 yearsAre you using urllib or requests? This is for urllib. If you have a bytes object, just use
json.loads(bytes_obj.decode())
. -
BMDan almost 5 years@jfs @jbg @phil-frost RFC8259 says, "Note: No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients." Whether it is therefore better to trust, to ignore, or to trust-but-heuristically-evaluate-and-then-work-around a
charset
that a server nonetheless elected to send is likely a problem of the deepest sort of bikeshedding variety. -
jfs almost 5 years@BMDan follow the link in my comment above that literally says: "no charset parameter defined..."
-
Baldrickk over 4 yearsThis is the best way. Really readable, and anyone who is doing something like this should have requests.