MemoryError using json.dumps()
Solution 1
You can simply replace
f.write(json.dumps(mytab,default=dthandler,indent=4))
by
json.dump(mytab, f, default=dthandler, indent=4)
This should "stream" the data into the file.
Solution 2
The JSON
module will allocate the entire JSON string in memory before writing, which is why MemoryError
occurs.
To get around this problem, use JSON.Encoder().iterencode()
:
with open(filepath, 'w') as f:
for chunk in json.JSONEncoder().iterencode(object_to_encode):
f.write(chunk)
However note that this will generally take quite a while, since it is writing in many small chunks and not everything at once.
Special case:
I had a Python object which is a list of dicts. Like such:
[
{ "prop": 1, "attr": 2 },
{ "prop": 3, "attr": 4 }
# ...
]
I could JSON.dumps()
individual objects, but the dumping whole list generates a MemoryError
To speed up writing, I opened the file and wrote the JSON delimiter manually:
with open(filepath, 'w') as f:
f.write('[')
for obj in list_of_dicts[:-1]:
json.dump(obj, f)
f.write(',')
json.dump(list_of_dicts[-1], f)
f.write(']')
You can probably get away with something like that if you know your JSON object structure beforehand. For a general use, just use JSON.Encoder().iterencode()
.
salamey
Updated on June 07, 2022Comments
-
salamey almost 2 years
I would like to know which one of
json.dump()
orjson.dumps()
are the most efficient when it comes to encoding a large array to json format.Can you please show me an example of using
json.dump()
?Actually I am making a Python CGI that gets large amount of data from a MySQL database using the ORM SQlAlchemy, and after some user triggered processing, I store the final output in an Array that I finally convert to Json.
But when converting to JSON with :
print json.dumps({'success': True, 'data': data}) #data is my array
I get the following error:
Traceback (most recent call last): File "C:/script/cgi/translate_parameters.py", line 617, in <module> f.write(json.dumps(mytab,default=dthandler,indent=4)) File "C:\Python27\lib\json\__init__.py", line 250, in dumps sort_keys=sort_keys, **kw).encode(obj) File "C:\Python27\lib\json\encoder.py", line 209, in encode chunks = list(chunks) MemoryError
So, my guess is using
json.dump()
to convert data by chunks. Any ideas on how to do this?Or other ideas besides using
json.dump()
? -
Xavier Ho almost 9 yearsThis doesn't work, even assuming
mytab
in your example is a JSON-serialisable object,json.dump()
doesn't know which object you're dumping. In addition, even trying that still producesMemoryError
for large objects. -
sebastian almost 9 yearsRight .. forgot mytab as argument. Fixed. Concerning Memory it might be worth trying other json libraries hoping for a more memory efficient implementation...
-
Xavier Ho almost 9 yearsFound a workaround eventually, posting my answer
:]
-
sebastian over 8 yearsThat is just what
json.dump
does: hg.python.org/cpython/file/v2.7.10/Lib/json/__init__.py#l183 -
Xavier Ho over 8 yearsYes, but iterencode() is too slow if you try large objects - It is best to divide chunks down to where your memory can handle, then pass it all to encode() at once.
-
jtlz2 over 2 years@XavierHo Thank you so so much - a hidden gem!