Force garbage collection in Python to free memory

26,566

Solution 1

Frederick Lundh explains,

If you create a large object and delete it again, Python has probably released the memory, but the memory allocators involved don’t necessarily return the memory to the operating system, so it may look as if the Python process uses a lot more virtual memory than it actually uses.

and Alex Martelli writes:

The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates.

So, you could use multiprocessing to spawn a subprocess, perform the memory-hogging calculation, and then ensure the memory is released when the subprocess terminates:

import multiprocessing as mp
import resource

def mem():
    print('Memory usage         : % 2.2f MB' % round(
        resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0,1)
    )

mem()

def memoryhog():
    print('...creating list of dicts...')
    n = 10**5
    l = []
    for i in xrange(n):
        a = 1000*'a'
        b = 1000*'b'
        l.append({ 'a' : a, 'b' : b })
    mem()

proc = mp.Process(target=memoryhog)
proc.start()
proc.join()

mem()

yields

Memory usage         :  5.80 MB
...creating list of dicts...
Memory usage         :  234.20 MB
Memory usage         :  5.90 MB

Solution 2

That might be somewhat useful, using multiprocessing and a library called Ray which uses shared memory to perform multi-gb data sharing between processes. This way is easy to spawn a secondary process and still access the same objects quick and easy from the parent process.

Share:
26,566
ddofborg
Author by

ddofborg

Updated on November 02, 2020

Comments

  • ddofborg
    ddofborg over 3 years

    I have a Python2.7 App which used lots of dict objects which mostly contain strings for keys and values.

    Sometimes those dicts and strings are not needed anymore and I would like to remove those from memory.

    I tried different things, del dict[key], del dict, etc. But the App still uses the same amount of memory.

    Below a example which I would expect to fee the memory. But it doesn't :(

    import gc
    import resource
    
    def mem():
        print('Memory usage         : % 2.2f MB' % round(
            resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0,1)
        )
    
    mem()
    
    print('...creating list of dicts...')
    n = 10000
    l = []
    for i in xrange(n):
        a = 1000*'a'
        b = 1000*'b'
        l.append({ 'a' : a, 'b' : b })
    
    mem()
    
    print('...deleting list items...')
    
    for i in xrange(n):
        l.pop(0)
    
    mem()
    
    print('GC collected objects : %d' % gc.collect())
    
    mem()
    

    Output:

    Memory usage         :  4.30 MB
    ...creating list of dicts...
    Memory usage         :  36.70 MB
    ...deleting list items...
    Memory usage         :  36.70 MB
    GC collected objects : 0
    Memory usage         :  36.70 MB
    

    I would expect here some objects to be 'collected' and some memory to be freed.

    Am I doing something wrong? Any other ways to delete unused objects or a least to find where the objects are unexpectedly used.

  • ddofborg
    ddofborg over 8 years
    I don't think that would be possible with our App (as a quick fix). Lots of objects are shared and need to to accessed in different parts of the App. But in your example, the second Mem usage... should yield more memory, don't you think?
  • Matt Anderson
    Matt Anderson over 8 years
    Python memory allocation is complicated. It creates memory pools for allocating to objects, and cannot return a pool of memory to the OS unless the entire pool is empty and unfragmented. Also, many built-in types keep a "free list" of previously allocated items to reuse in the future for the same types, instead of creating new memory allocations.
  • unutbu
    unutbu over 8 years
    @ddofborg: My machine's getrusage reports the maximum resident set size in kilobytes, not bytes.
  • ddofborg
    ddofborg over 8 years
    @MattAnderson Ok, I understand it is complicated. You are implying there is no way to release the memory to the OS after lots of data is being processed?
  • Matt Anderson
    Matt Anderson over 8 years
    @ddofborg - after a deep dive or two into cpython 2.7 memory management, my team's production-level strategy is to have any huge memory requirement processing happen in its own process and exit, and let the OS clean up after it when it is done. Try to keep the "command and control" process to a reasonable size (500MB - 1GB in our case). Whether or not memory can be returned to the OS depends on what objects are allocated and how fragmented memory gets. From reading Python 3.x release notes, the situation may have improved in the 3-series interpreters.
  • DaveL17
    DaveL17 over 3 years
    Since the OP is tagged with Python2.7, I'll add that multiprocessing can create some problems (in Python v2.x) with payload objects that may not fork properly (like a multi-threaded class object). I'm sure I don't fully understand why, but I've seen situations where threaded applications have their locks forked "improperly" and the multiprocessing subprocess can silently fail. This can cause the process to run indefinitely, and create some very interesting problems. Surely, someone smarter than me can explain it properly, but it pays to make sure you're getting out the subprocess cleanly.