What's the difference between __iter__ and __getitem__?

13,080

Solution 1

Yes, this is an intended design. It is documented, well-tested, and relied upon by sequence types such as str.

The __getitem__ version is a legacy before Python had modern iterators. The idea was that any sequence (something that is indexable and has a length) would be automatically iterable using the series s[0], s[1], s[2], ... until IndexError or StopIteration is raised.

In Python 2.7 for example, strings are iterable because of the __getitem__ method (the str type does not have an __iter__ method).

In contrast, the iterator protocol lets any class be iterable without necessarily being indexable (dicts and sets for example).

Here is how to make an iterable class using the legacy style for sequences:

>>> class A:
        def __getitem__(self, index):
            if index >= 10:
                raise IndexError
            return index * 111

>>> list(A())
[0, 111, 222, 333, 444, 555, 666, 777, 888, 999]

Here is how to make an iterable using the __iter__ approach:

>>> class B:
        def __iter__(self):
            yield 10
            yield 20
            yield 30


>>> list(B())
[10, 20, 30]

For those who are interested in the details, the relevant code is in Objects/iterobject.c:

static PyObject *
iter_iternext(PyObject *iterator)
{
    seqiterobject *it;
    PyObject *seq;
    PyObject *result;

    assert(PySeqIter_Check(iterator));
    it = (seqiterobject *)iterator;
    seq = it->it_seq;
    if (seq == NULL)
        return NULL;

    result = PySequence_GetItem(seq, it->it_index);
    if (result != NULL) {
        it->it_index++;
        return result;
    }
    if (PyErr_ExceptionMatches(PyExc_IndexError) ||
        PyErr_ExceptionMatches(PyExc_StopIteration))
    {
        PyErr_Clear();
        Py_DECREF(seq);
        it->it_seq = NULL;
    }
    return NULL;
}

and in Objects/abstract.c:

int
PySequence_Check(PyObject *s)
{
    if (s == NULL)
        return 0;
    if (PyInstance_Check(s))
        return PyObject_HasAttrString(s, "__getitem__");
    if (PyDict_Check(s))
        return 0;
    return  s->ob_type->tp_as_sequence &&
        s->ob_type->tp_as_sequence->sq_item != NULL;
}

Solution 2

__iter__ is the preferred way to iterate through an iterable object. If it is not defined the interpreter will try to simulate its behavior using __getitem__. Take a look here

Solution 3

To get the result you are expecting, you need to have a data element with limited len and return each in sequence:

class foo:
    def __init__(self):
        self.data=[10,11,12]

    def __getitem__(self, arg):
        print('__getitem__ called with arg {}'.format(arg))
        return self.data[arg]

bar = foo()
for i in bar:
    print('__getitem__ returned {}'.format(i)) 

Prints:

__getitem__ called with arg 0
__getitem__ returned 10
__getitem__ called with arg 1
__getitem__ returned 11
__getitem__ called with arg 2
__getitem__ returned 12
__getitem__ called with arg 3

Or you can signal the end of the 'sequence' by raising IndexError (although StopIteration works as well...):

class foo:
    def __getitem__(self, arg):
        print('__getitem__ called with arg {}'.format(arg))
        if arg>3:
            raise IndexError
        else:    
            return arg

bar = foo()
for i in bar:
    print('__getitem__ returned {}'.format(i))   

Prints:

__getitem__ called with arg 0
__getitem__ returned 0
__getitem__ called with arg 1
__getitem__ returned 1
__getitem__ called with arg 2
__getitem__ returned 2
__getitem__ called with arg 3
__getitem__ returned 3
__getitem__ called with arg 4

The for loop is expecting either IndexError or StopIteration to signal the end of the sequence.

Share:
13,080

Related videos on Youtube

wegry
Author by

wegry

Updated on December 13, 2020

Comments

  • wegry
    wegry over 3 years

    This happens in Python 2.7.6 and 3.3.3 for me. When I define a class like this

    class foo:
        def __getitem__(self, *args):
            print(*args)
    

    And then try to iterate (and what I thought would call iter) on an instance,

    bar = foo()
    for i in bar:
        print(i)
    

    it just counts up by one for the args and prints None forever. Is this intentional as far as the language design is concerned?

    Sample output

    0
    None
    1
    None
    2
    None
    3
    None
    4
    None
    5
    None
    6
    None
    7
    None
    8
    None
    9
    None
    10
    None
    
  • alko
    alko over 10 years
    Agree with @thefourtheye, correct link should be for example PEP 234
  • smeso
    smeso over 10 years
    Thank you, I couldn't find it. Edited.
  • thefourtheye
    thefourtheye over 10 years
    Thanks :) I would like to read more about it. Could you please give some docs, where I can read up?
  • Raymond Hettinger
    Raymond Hettinger over 10 years
    Important correction: The PEP doesn't not state that __iter__ is preferred over __getitem__; rather, it defines both and simply says that __iter__ is tried first before falling back to the __getitem__ approach. The whole point of the PEP was to add iteration support for objects that aren't sequences. The was no effort to remove the existing sequence support or to deter its use.
  • smeso
    smeso over 10 years
    The reason why __iter__ is tried first is that sometimes could happen that its implementation allows for better performance than the old-style iterator protocol. In this sense it is also the preferred way to do this.
  • Raymond Hettinger
    Raymond Hettinger over 10 years
    @Faust It is important to not use the word "preferred" -- that incorrectly suggests that people aren't supposed to use the __getitem__ approach. You are correct that sometimes __iter__ can have better performance. That is why I implemented the listiterator eventhough it was already iterable using __getitem__. On the other hand, there was no performance improvement for str/unicode using __iter__. That is why I didn't add a string iterator.
  • Raymond Hettinger
    Raymond Hettinger over 10 years
    @thefourtheye Here is a link for the iter() function, docs.python.org/2.7/library/functions.html#iter . Another link in the definition of iterable, docs.python.org/2.7/glossary.html#term-iterable and a mention that all sequences are iterable in the section on iterator types, docs.python.org/2.7/library/stdtypes.html#iterator-types . Lastly, there is a write-up in PEP 234, python.org/dev/peps/pep-0234 . Happy reading :-) .