Python Response Decoding
Ask python:
>>> r=urllib.urlopen("http://google.com")
>>> a=r.read()
>>> type(a)
0: <type 'str'>
>>> help(a.decode)
Help on built-in function decode:
decode(...)
S.decode([encoding[,errors]]) -> object
Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that is
able to handle UnicodeDecodeErrors.
>>> b = a.decode('utf8')
>>> type(b)
1: <type 'unicode'>
>>>
So, it appears that read()
returns an str
. .decode()
decodes from UTF-8 to Python's internal unicode format.
Related videos on Youtube
darksky
C, C++, Linux, x86, Python Low latency systems Also: iOS (Objective-C, Cocoa Touch), Ruby, Ruby on Rails, Django, Flask, JavaScript, Java, Bash.
Updated on June 17, 2022Comments
-
darksky almost 2 years
For the following lines that use
urllib
:# some request object exists response = urllib.request.urlopen(request) html = response.read().decode("utf8")
What format of string does
read()
return? I've been trying t figure that out form Python's documentation but it does not mention it at all. Why is there adecode
? Doesdecode
decode an object to utf-8 or from utf-8? From what format to what format does it decode it to?decode
documentation also mentions nothing about that. Is it that Python's documentation is that terrible, or is it that I don't understand some standard convention?I want to store that HTML in a UTF-8 file. Would I just do a regular write, or do I need to "encode" back into something and write that?
Note: I know urllib is deprecated, but I cannot switch to urllib2 right now
-
darksky about 11 yearsThanks for down votes without a comment...?
-
root about 11 years
-