Why do tuples take less space in memory than lists?
Solution 1
I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.
Regardless of the implementation, list
s are variable-sized while tuple
s are fixed-size.
So tuple
s can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.
But there's another thing that list
s do: They over-allocate. Otherwise list.append
would be an O(n)
operation always - to make it amortized O(1)
(much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuple
s only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.
So list
s need at least 16 bytes more memory than tuple
s. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:
>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4) # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96
>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1) # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2) # no re-alloc
>>> l.append(3) # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4) # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72
Images
I decided to create some images to accompany the explanation above. Maybe these are helpful
This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:
That's actually just an approximation because int
objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:
Useful links:
tuple
struct in CPython repository for Python 2.7list
struct in CPython repository for Python 2.7int
struct in CPython repository for Python 2.7
Note that __sizeof__
doesn't really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof
the result is different:
>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72
There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the __sizeof__
method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof
(which actually adds the GC overhead to the value returned from __sizeof__
).
Solution 2
I'll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won't touch on that.
I'm going to use 64-bit values here, as you are.
The size for list
s is calculated from the following function, list_sizeof
:
static PyObject *
list_sizeof(PyListObject *self)
{
Py_ssize_t res;
res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
return PyInt_FromSsize_t(res);
}
Here Py_TYPE(self)
is a macro that grabs the ob_type
of self
(returning PyList_Type
) while _PyObject_SIZE
is another macro that grabs tp_basicsize
from that type. tp_basicsize
is calculated as sizeof(PyListObject)
where PyListObject
is the instance struct.
The PyListObject
structure has three fields:
PyObject_VAR_HEAD # 24 bytes
PyObject **ob_item; # 8 bytes
Py_ssize_t allocated; # 8 bytes
these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD
expands into three 8 byte fields (ob_refcount
, ob_type
and ob_size
) so a 24
byte contribution.
So for now res
is:
sizeof(PyListObject) + self->allocated * sizeof(void*)
or:
40 + self->allocated * sizeof(void*)
If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated
, as it's name implies, holds the number of allocated elements.
Without any elements, the size of lists is calculated to be:
>>> [].__sizeof__()
40
i.e the size of the instance struct.
tuple
objects don't define a tuple_sizeof
function. Instead, they use object_sizeof
to calculate their size:
static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
Py_ssize_t res, isize;
res = 0;
isize = self->ob_type->tp_itemsize;
if (isize > 0)
res = Py_SIZE(self) * isize;
res += self->ob_type->tp_basicsize;
return PyInt_FromSsize_t(res);
}
This, as for list
s, grabs the tp_basicsize
and, if the object has a non-zero tp_itemsize
(meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE
) with tp_itemsize
.
tp_basicsize
again uses sizeof(PyTupleObject)
where the PyTupleObject
struct contains:
PyObject_VAR_HEAD # 24 bytes
PyObject *ob_item[1]; # 8 bytes
So, without any elements (that is, Py_SIZE
returns 0
) the size of empty tuples is equal to sizeof(PyTupleObject)
:
>>> ().__sizeof__()
24
huh? Well, here's an oddity which I haven't found an explanation for, the tp_basicsize
of tuple
s is actually calculated as follows:
sizeof(PyTupleObject) - sizeof(PyObject *)
why an additional 8
bytes is removed from tp_basicsize
is something I haven't been able to find out. (See MSeifert's comment for a possible explanation)
But, this is basically the difference in your specific example. list
s also keep around a number of allocated elements which helps determine when to over-allocate again.
Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert's covers nicely in his answer.
Solution 3
MSeifert answer covers it broadly; to keep it simple you can think of:
tuple
is immutable. Once set, you can't change it. So you know in advance how much memory you need to allocate for that object.
list
is mutable. You can add or remove items to or from it. It has to know its current size. It resizes as needed.
There are no free meals - these capabilities comes with a cost. Hence the overhead in memory for lists.
Solution 4
The size of the tuple is prefixed, i.e. at tuple initialization the interpreter allocates enough space for the contained data and hence it's immutable (can't be modified). Whereas a list is a mutable object, hence implying dynamic allocation of memory, so to avoid allocating space each time you append or modify the list (allocate enough space to contain the changed data and copy the data to it), it allocates additional space for future runtime changes, like appends and modifications.
That pretty much sums it up.
Related videos on Youtube
JON
Updated on June 25, 2021Comments
-
JON almost 3 years
A
tuple
takes less memory space in Python:>>> a = (1,2,3) >>> a.__sizeof__() 48
whereas
list
s takes more memory space:>>> b = [1,2,3] >>> b.__sizeof__() 64
What happens internally on the Python memory management?
-
Metareven over 6 yearsI am not sure on how this works internally, but the list object at least has more functions like for instance append which the tuple does not have. It therefore makes sense for the tuple as a simpler type of object to be smaller
-
Amrit over 6 yearsI think it also depend on machine to machine ....for me when i check a = (1,2,3) takes 72 and b = [1,2,3] takes 88.
-
Lee Daniel Crocker over 6 yearsPython tuples are immutable. Mutable objects have extra overhead to deal with runtime changes.
-
jjmontes over 6 years@Metareven the number of methods a type has does not impact the memory space the instances take. The method list and their code are handled by the object prototype, but instances store only data and internal variables.
-
-
MSeifert over 6 yearsI believe the
ob_item[1]
is mostly a placeholder (so it makes sense it's subtracted from the basicsize). Thetuple
is allocated usingPyObject_NewVar
. I haven't figured out the details so that's just an educated guess... -
Dimitris Fasarakis Hilliard over 6 years@MSeifert Sorry for that, fixed :-). I really don't know, I remember finding it in the past at some point but I never giving it too much attention, maybe I'll just ask a question at some point in the future :-)
-
ikegami over 6 yearsRe "So lists need at least 16 bytes more memory than tuples.", Wouldn't that be 8? One size for tuples and two sizes for lists means one extra size for lists.
-
MSeifert over 6 yearsYes, list has one extra "size" (8 byte) but also stores a pointer (8byte) to the "array of PyObject"s instead of storing them in the struct directly (what the tuple does). 8+8=16.
-
vishes_shell over 6 yearsAnother useful link about
list
memory allocation stackoverflow.com/questions/40018398/… -
MSeifert over 6 years@vishes_shell That's not really related to the question because the code in the question doesn't over-allocate at all. But yes it's useful in case you want to know more about the amount of over-allocation when using
list()
or a list comprehension. -
Pekka Klärck over 6 yearsThe amount of "wasted" memory is so small that it typically doesn't matter unless you have a huge amount of lists/tuples. If many of the containers are empty, the difference gets bigger, though. An empty tuple is a singleton (at least in CPython) so they take in practice no memory. A new empty list is always a new object that needs its own memory allocation.
-
user3349993 over 6 years@MSeifert, you mentioned that lists are trying to amortize the append command. Does it mean that tuples do not do that. And, hence, tuples will be slower when you try to append elements to a tuple?
-
MSeifert over 6 years@user3349993 Tuples are immutable, so you can't append to a tuple or remove an item from a tuple.