When are objects garbage collected in python?

12,654

Solution 1

Here is an excerpt from the language reference

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).

EDIT: About postponing garbage collection .... the gc module allows you to interact with the garbage collector, and disable it if you want to and change collection frequency etc. But I have not used it myself. Also, cycles that contain any objects with __del__ methods are not collected.

Solution 2

When are objects garbage collected in python?

There is a lot of detail in the source code for CPython: http://svn.python.org/view/python/trunk/Modules/gcmodule.c?revision=81029&view=markup

Any time a reference count drops to zero, the object is immediately removed.

293 /* Python's cyclic gc should never see an incoming refcount

294 * of 0: if something decref'ed to 0, it should have been

295 * deallocated immediately at that time.

A full collection is triggered when the number of new objects is greater than 25% of the number of existing objects.

87 In addition to the various configurable thresholds, we only trigger a

88 full collection if the ratio

89 long_lived_pending / long_lived_total

90 is above a given value (hardwired to 25%).

When is the memory released?

I was only able to fish out this information.

781 /* Clear all free lists

782 * All free lists are cleared during the collection of the highest generation.

783 * Allocated items in the free list may keep a pymalloc arena occupied.

784 * Clearing the free lists may give back memory to the OS earlier.

785 */

According to this, Python may be keeping your object in a free list for recycling even if you drop its refcount to zero. I am unable to explicitly find when the free call is made to give memory back to the operating system, but I imagine that this is done whenever a collection is made and the object is not being kept in a free list.

Does the collection impact performance?

Any non-trivial garbage collector I have heard of requires both CPU and memory to operate. Therefore, yes, there is always an impact on performance. You'll have to experiment and get to know your garbage collector.

Programs that require real time responsiveness I have run into issues with, since garbage collectors don't grant me control over when they run or for how long they do. Some peculiar cases can cause excessive memory use as well, an example being Python's knack for keeping free lists.

Solution 3

To expand on the previous answers with some more numbers and actionable information:

You can use gc.set_threshold(threshold0[, threshold1[, threshold2]]) to tune when automatic garbage collection kicks in:

The GC classifies objects into three generations depending on how many collection sweeps they have survived. New objects are placed in the youngest generation (generation 0). If an object survives a collection it is moved into the next older generation. Since generation 2 is the oldest generation, objects in that generation remain there after a collection. In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts. Initially only generation 0 is examined. If generation 0 has been examined more than threshold1 times since generation 1 has been examined, then generation 1 is examined as well. With the third generation, things are a bit more complicated, see Collecting the oldest generation for more information.

While I could not find the default thresholds in the documentation, looking through the implementation, the default values for the thresholds seem to be (CPython 3.9.1) :

  • threshold0: 700
  • threshold1: 10
  • threshold2: 10

I.e. by default, automatic garbage collection should set in once the number of allocations minus the number of deallocations exceeds 700.

Share:
12,654

Related videos on Youtube

Matt Alcock
Author by

Matt Alcock

I love anything involving data, python, visualisation and the web.

Updated on June 25, 2022

Comments

  • Matt Alcock
    Matt Alcock almost 2 years

    When are objects garbage collected in python? When is the memory released and does the collection impact performance? Can one opt out or tune the gc algorithm and if so how?

  • Matt Alcock
    Matt Alcock about 12 years
    Nice, although quite fauge. Any idea how you emit or postpone gc?
  • erisco
    erisco about 12 years
    This is a different question Matt Alcock and the answer is available.
  • user1066101
    user1066101 about 12 years
    @MattAlcock: emit or postpone gc? (1) That's a separate question. And. (2) Why would you want to? If you don't want an object garbage collected, assign it to a variable.
  • Matt Alcock
    Matt Alcock about 12 years
    Often high performance systems will want to emit gc to guarentte consistent performance imagine a flight control system timing out for a bit whilst it did garbage collection?
  • Praveen Gollakota
    Praveen Gollakota about 12 years
    @MattAlcock Added details about gc module in the post.
  • user1066101
    user1066101 about 12 years
    @MattAlcock: Often high performance systems do not use any dynamic memory allocation of any kind. When I made radars and sonars, the data structures were strictly statically allocated. The very idea of using any dynamic memory allocation in a high performance system seems contradictory. Turning off garbage collection in Python is traditionally done by writing performance-critical code in C and calling that from Python.
  • Marlon Abeykoon
    Marlon Abeykoon almost 6 years
    Link is expired
  • leopold.talirz
    leopold.talirz over 3 years
    Updated permalink from GitHub github.com/python/cpython/blob/…