What is the right way to compress and decompress UTF-8 data using zlib?
Your JSON data is not UTF-8 encoded. The
encoding parameter to the
json.dumps() function instructs it how to interpret Python byte strings in
message (e.g. the input), not how to encode the resulting output. It doesn't encode the output at all because you used
Encode the data before compression:
ssc = zlib.compress(ss.encode('utf8'))
When decompressing again, there is no need to decode from UTF-8; the
json.loads() function assumes UTF-8 if the input is a bytestring.
A little addition to Martijn's response. I read in an Enthought blog a nifty one liner statement that will spare you the need to import zlib in your own code.
Safely compressing a string (including your json dump) would look like that:
ssc = ss.encode('utf-8').encode('zlib_codec')
Decompressing back to utf-8 would be:
ss = ssc.decode('zlib_codec').decode('utf-8')
Hope this helps.
I Z 3 months
I have a very long JSON message that contains characters that go beyond the ASCII table. I convert it into a string as follows:
messStr = json.dumps(message,encoding='utf-8', ensure_ascii=False, sort_keys=True)
I need to store this string using a service that restricts its size to X bytes. I want to split the JSON string into pieces of length X and store them separately. I ran into some issues doing this (described here) so I want to compress the string slices to work around those issues. I tried to do this:
ss = mStr[start:fin] # get piece of length X ssc = zlib.compress(ss) # compress it
When I do that, I get the following error from
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 225: ordinal not in range(128)
What is the right way to compress a UTF-8 string and what is then the right way to decompress it?
Anshu Dwibhashi over 7 yearsThis is what worked for me, rather than the other answer. Thanks for the epic solution! +1
Slawomir almost 4 yearsAbove only works in Python 3.x since zlib package (finally) takes byte-array as input note a string. In Python 2.7 this won't work because zlib.compress takes a string and uses ascii codec to turn the input into a byte-array - hence the OP's error message.
Martijn Pieters almost 4 years@Debriter yes, the problem in the question is unique to Python 2.
Lynx-Lab over 3 years@nurettin this code worked on python2 at it was when the question was asked. From your error message, it seems like you are using python3.
Jason R Stevens CFA over 1 yearI like this answer for avoiding the separate
zlibimport. I do suspect this penalizes code readability, as the direct use of the
zlibmodule is front and center, whereas
zlib_codecin the above is merely part of a chain. Thanks for the great answer!