Python: Iterators and Generators

June 17, 2017
8 min. read

This post is part of the Python Tips series.

Iterators and Generators are two mechanisms for providing data to cycle through in Python. The difference is if the data is already in memory and we iterate along it, or if we are generating it as we go. The names actually make sense. These turn out to be very similar and just differ in execution. One is better if you already have a class. The other is a little simpler.

Iterators

First we will look at how Python gives an iterator interface to built in objects, such as list and string.

Both of the for loops below will print the same thing.

for i in [1, 2, 3, 4, 5, 6]:
    print(i)

for c in '123456':
    print(c)

If we use the dir function to take a look at the list object, we get this:

l = [1, 2, 3, 4, 5, 6]

print(dir(l))
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

If we do the same on the string, we get this:

s = '123456'

print(dir(s))
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

This is a little more noise than I needed and I could have just explained that __iter__ is the dunder method that gives you an iterator. But I think it is important to show you how to drill down on Python objects with dir if you want to dig deeper and poke around.

As I mentioned, both of these share the important methods for iterating: __iter__. This will yield an iterator object. This object will have the dunder method __next__ which gives you values until it runs out.

Iterator Operation

Lets manually iterate and see what happens.

s = '123'

iter_obj = s.__iter__()
print(iter_obj.__next__())
print(iter_obj.__next__())
print(iter_obj.__next__())
print(iter_obj.__next__())
1
2
3
Traceback (most recent call last):
  File "C:/Users/micro/repositories/scratchpad/iterators-generators.py", line 15, in <module>
    print(iter_obj.__next__())
StopIteration

Process finished with exit code 1

I only had 3 items in the string to iterate, but called __next__ 4 times. The last call showed the method that Python uses to stop iteration, by raising a StopIteration exception. Much of Python is controlled with exceptions as terminating events.

Creating an Iterator

As with many examples, we are replacing already present functionality for an easy example. However, as you start building more complex data structures, you will see how you can integrate this into your object.

So we initialize the object with the data we will iterate on. __iter__ sets position to -1 (as I will increment first thing) and sends a reference to itself. I should note that while I am returning self as the iterator object and thus can have the __next__ dunder in the same object, this isn’t always the case.

__next__ will raise a StopIteration exception if we are at the end of the list, based on len. Otherwise, it will return the current item.

class MyIterator(object):
    """
    An example of an iterator over list, to show iterator basics
    """
    def __init__(self, my_list):
        self._my_list = my_list

    def __iter__(self):
        self._pos = -1
        return self

    def __next__(self):
        self._pos += 1
        if self._pos >= len(self._my_list):
            raise StopIteration
        return self._my_list[self._pos]

for i in MyIterator([1, 2, 3]):
    print(i)

Pretty simple right?

I should note that in Python 2, the __next__ method was next. This made for possible collisions when this name was wanted for something other than iteration. If you run into older code that has this, or code that has __next__ just calling next for Python 3 compatibility, you now know what is going on.

Generators

Generators are easier to build than iterators, but have a limitation. Once they are consumed, they cannot be rewound. Iterating a list can happen multiple times. Iterating a generator is a one shot deal.

The creation of a generator doesn’t require defining a class and implementing the dunder methods for an iterator. The special sauce is in the yield keyword, which we saw before in context generator creation.

Many of you have used the range function in a for loop. In Python 3, this is a generator function. Lets create a range function of our own and see how this would work.

Starting out we will ignore many of the options in range. I store the start and step values, and then loop while start is less than end. Each time I yield the current start value. This replicates the call to range with one argument.

def my_simple_range(end):
    start = 0
    step = 1
    while start < end:
        yield start
        start += step

iter_obj = my_simple_range(3)
print(iter_obj.__next__())
print(iter_obj.__next__())
print(iter_obj.__next__())
print(iter_obj.__next__())
Traceback (most recent call last):
0
1
2
  File "C:/Users/micro/repositories/scratchpad/iterators-generators.py", line 81, in <module>
    print(iter_obj.__next__())
StopIteration

That output looks familiar, doesn’t it? This behaves exactly like an iterator with the same termination condition. This happens because both are exposing the __next__ method. The generator just lets Python do more of the behind the scenes work for you.

What if we wanted to implement a full range with arguments like range(start, stop, step)?

We can do that. However, we need to understand that this can count up and down, so we need to handle all cases. This is a little complicated. I’ll do the right thing and make some tests first.

Tests First

To test, we want to compare an iterator to a static list. Luckily for us, the list() creation can consume the iterator. This is a way to cache the values of a generator.

This is also a good opportunity to explain more advanced pytest functionality and show how parameters can save time making test cases.

The first test function is using the pytest parametrize capability. (That still seems like a weird word to me, but the functionality is awesome.) I give a string with argument names comma delimited. Then I give a list of tuples for the arguments.

I’m using arguments to hold a tuple of my arguments to my_range. This allows me to send various combinations. The expected argument is a list of what I would get from the iterator. You can see in the assert that I’m using list() to consume the iterator for comparison.

In this test, I’m looking for functionality and not sending illegal count of arguments (that is the second test function.)

For test_my_range_operation, pytest will call the function 8 times. Once for each parameter. I could write 8 separate calls or complete testing functions, but that would just be annoying and offer no benefit.

The test_my_range_bad_argument_counts function is to assure that we provide a TypeError unless argument count is between 1 and 3 inclusive. This uses the pytest.raises context manager for holding code that generates the exception.

import pytest

@pytest.mark.parametrize("arguments,expected", [
    ((5, ), [0, 1, 2, 3, 4]),
    ((0, ), []),
    ((-3, ), []),
    ((0, 4), [0, 1, 2, 3]),
    ((3, 1), []),
    ((3, 6), [3, 4, 5]),
    ((0, 5, 1), [0, 1, 2, 3, 4]),
    ((5, 0, -1), [5, 4, 3, 2, 1])
])
def test_my_range_operation(arguments, expected):
    # expanding arguments tuple to values for the call and consuming the iterator with list().

    assert list(my_range(*arguments)) == expected

def test_my_range_bad_argument_counts():
    with pytest.raises(TypeError):
        my_range()  # Requires more than 0 arguments

    with pytest.raises(TypeError):
        my_range(1, 2, 1, 5)  # Requires less than 4 arguments

When we run these on a shell my_range all will fail. Now we can code an implementation of my_range that passes the tests.

my_range implementation

I’m using *args and parsing, because the arguments change positions depending on number.

We start with raising TypeError for argument len outside of 1-3.

Next we are initializing start and step to defaults, because we may never receive them. Then set step if available. Set start and end if available, else just end.

Our while is a little different, because we need start > end only if step is positive. Otherwise, we need start < end if step is negative. This will also fall through is step is zero (instead of causing an infinite loop).

All that is left is to yield start and then increment by step.

def my_range(*args):
    """
    Generate function to return a range of numbers

    :param args: if one argument, use as upper limit with 0 start.
                 if two arguments, use as lower, upper limits.
                 if three arguments, use as lower, upper and step.
    :yields: value
    """
    arg_len = len(args)
    if arg_len > 3:
        raise TypeError('{} arguments given and a max of 3 are allowed.'.format(arg_len))
    if arg_len == 0:
        raise TypeError('No arguments given and at least 1 is required.')

    start = 0
    step = 1
    if arg_len == 3:
        step = args[2]
    if len(args) > 1:
        start = args[0]
        end = args[1]
    else:
        end = args[0]

    # Need to handle positive and negative steps

    while (start < end and step > 0) or (start > end and step < 0):
        yield start
        start += step

This passes all our tests correctly.

I wondered how Python handles a step of zero. When calling range(1, 4, 0) I get a ValueError stating that step can't be zero, instead of our silent fail. That would be easy to add, but otherwise we seem to have a decent version of range. And hopefully, you now know how iterators and generator function work.


Part 7 of 9 in the Python Tips series.

Series Start | Python: Context Managers | Python: Dunder name

comments powered by Disqus