MemoryError using json.dumps()

13,643

Solution 1

You can simply replace

f.write(json.dumps(mytab,default=dthandler,indent=4))

by

json.dump(mytab, f, default=dthandler, indent=4)

This should "stream" the data into the file.

Solution 2

The JSON module will allocate the entire JSON string in memory before writing, which is why MemoryError occurs.

To get around this problem, use JSON.Encoder().iterencode():

with open(filepath, 'w') as f:
    for chunk in json.JSONEncoder().iterencode(object_to_encode):
        f.write(chunk)

However note that this will generally take quite a while, since it is writing in many small chunks and not everything at once.


Special case:

I had a Python object which is a list of dicts. Like such:

[
    { "prop": 1, "attr": 2 },
    { "prop": 3, "attr": 4 }
    # ...
]

I could JSON.dumps() individual objects, but the dumping whole list generates a MemoryError To speed up writing, I opened the file and wrote the JSON delimiter manually:

with open(filepath, 'w') as f:
    f.write('[')

    for obj in list_of_dicts[:-1]:
        json.dump(obj, f)
        f.write(',')

    json.dump(list_of_dicts[-1], f)
    f.write(']')

You can probably get away with something like that if you know your JSON object structure beforehand. For a general use, just use JSON.Encoder().iterencode().

Share:
13,643
salamey
Author by

salamey

Updated on June 07, 2022

Comments

  • salamey
    salamey almost 2 years

    I would like to know which one of json.dump() or json.dumps() are the most efficient when it comes to encoding a large array to json format.

    Can you please show me an example of using json.dump()?

    Actually I am making a Python CGI that gets large amount of data from a MySQL database using the ORM SQlAlchemy, and after some user triggered processing, I store the final output in an Array that I finally convert to Json.

    But when converting to JSON with :

     print json.dumps({'success': True, 'data': data}) #data is my array
    

    I get the following error:

    Traceback (most recent call last):
      File "C:/script/cgi/translate_parameters.py", line 617, in     <module>
    f.write(json.dumps(mytab,default=dthandler,indent=4))
      File "C:\Python27\lib\json\__init__.py", line 250, in dumps
        sort_keys=sort_keys, **kw).encode(obj)
      File "C:\Python27\lib\json\encoder.py", line 209, in encode
        chunks = list(chunks)
    MemoryError
    

    So, my guess is using json.dump() to convert data by chunks. Any ideas on how to do this?

    Or other ideas besides using json.dump()?

  • Xavier Ho
    Xavier Ho almost 9 years
    This doesn't work, even assuming mytab in your example is a JSON-serialisable object, json.dump() doesn't know which object you're dumping. In addition, even trying that still produces MemoryError for large objects.
  • sebastian
    sebastian almost 9 years
    Right .. forgot mytab as argument. Fixed. Concerning Memory it might be worth trying other json libraries hoping for a more memory efficient implementation...
  • Xavier Ho
    Xavier Ho almost 9 years
    Found a workaround eventually, posting my answer :]
  • sebastian
    sebastian over 8 years
  • Xavier Ho
    Xavier Ho over 8 years
    Yes, but iterencode() is too slow if you try large objects - It is best to divide chunks down to where your memory can handle, then pass it all to encode() at once.
  • jtlz2
    jtlz2 over 2 years
    @XavierHo Thank you so so much - a hidden gem!