June 17, 2017
8 min. read
This post is part of the Python Tips series.
Iterators and Generators are two mechanisms for providing data to cycle through in Python. The difference is if the data is already in memory and we iterate along it, or if we are generating it as we go. The names actually make sense. These turn out to be very similar and just differ in execution. One is better if you already have a class. The other is a little simpler.
Iterators
First we will look at how Python gives an iterator interface to built in objects, such as list and string.
Both of the for
loops below will print the same thing.
for i in [1, 2, 3, 4, 5, 6]:
print(i)
for c in '123456':
print(c)
If we use the dir
function to take a look at the list object, we get this:
l = [1, 2, 3, 4, 5, 6]
print(dir(l))
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
If we do the same on the string, we get this:
s = '123456'
print(dir(s))
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
This is a little more noise than I needed and I could have just explained that __iter__
is the dunder method that gives you an iterator. But I think it is important to show you how to drill down on Python objects with dir
if you want to dig deeper and poke around.
As I mentioned, both of these share the important methods for iterating: __iter__
. This will yield an iterator object. This object will have the dunder method __next__
which gives you values until it runs out.
Iterator Operation
Lets manually iterate and see what happens.
s = '123'
iter_obj = s.__iter__()
print(iter_obj.__next__())
print(iter_obj.__next__())
print(iter_obj.__next__())
print(iter_obj.__next__())
1
2
3
Traceback (most recent call last):
File "C:/Users/micro/repositories/scratchpad/iterators-generators.py", line 15, in <module>
print(iter_obj.__next__())
StopIteration
Process finished with exit code 1
I only had 3 items in the string to iterate, but called __next__
4 times. The last call showed the method that Python uses to stop iteration, by raising a StopIteration
exception. Much of Python is controlled with exceptions as terminating events.
Creating an Iterator
As with many examples, we are replacing already present functionality for an easy example. However, as you start building more complex data structures, you will see how you can integrate this into your object.
So we initialize the object with the data we will iterate on. __iter__
sets position to -1 (as I will increment first thing) and sends a reference to itself. I should note that while I am returning self
as the iterator object and thus can have the __next__
dunder in the same object, this isn’t always the case.
__next__
will raise a StopIteration
exception if we are at the end of the list, based on len
. Otherwise, it will return the current item.
class MyIterator(object):
"""
An example of an iterator over list, to show iterator basics
"""
def __init__(self, my_list):
self._my_list = my_list
def __iter__(self):
self._pos = -1
return self
def __next__(self):
self._pos += 1
if self._pos >= len(self._my_list):
raise StopIteration
return self._my_list[self._pos]
for i in MyIterator([1, 2, 3]):
print(i)
Pretty simple right?
I should note that in Python 2, the __next__
method was next
. This made for possible collisions when this name was wanted for something other than iteration. If you run into older code that has this, or code that has __next__
just calling next
for Python 3 compatibility, you now know what is going on.
Generators
Generators are easier to build than iterators, but have a limitation. Once they are consumed, they cannot be rewound. Iterating a list can happen multiple times. Iterating a generator is a one shot deal.
The creation of a generator doesn’t require defining a class and implementing the dunder methods for an iterator. The special sauce is in the yield
keyword, which we saw before in context generator creation.
Many of you have used the range
function in a for
loop. In Python 3, this is a generator function. Lets create a range function of our own and see how this would work.
Starting out we will ignore many of the options in range. I store the start
and step
values, and then loop while start
is less than end
. Each time I yield
the current start
value. This replicates the call to range
with one argument.
def my_simple_range(end):
start = 0
step = 1
while start < end:
yield start
start += step
iter_obj = my_simple_range(3)
print(iter_obj.__next__())
print(iter_obj.__next__())
print(iter_obj.__next__())
print(iter_obj.__next__())
Traceback (most recent call last):
0
1
2
File "C:/Users/micro/repositories/scratchpad/iterators-generators.py", line 81, in <module>
print(iter_obj.__next__())
StopIteration
That output looks familiar, doesn’t it? This behaves exactly like an iterator with the same termination condition. This happens because both are exposing the __next__
method. The generator just lets Python do more of the behind the scenes work for you.
What if we wanted to implement a full range with arguments like range(start, stop, step)
?
We can do that. However, we need to understand that this can count up and down, so we need to handle all cases. This is a little complicated. I’ll do the right thing and make some tests first.
Tests First
To test, we want to compare an iterator to a static list. Luckily for us, the list()
creation can consume the iterator. This is a way to cache the values of a generator.
This is also a good opportunity to explain more advanced pytest functionality and show how parameters can save time making test cases.
The first test function is using the pytest parametrize
capability. (That still seems like a weird word to me, but the functionality is awesome.) I give a string with argument names comma delimited. Then I give a list of tuples for the arguments.
I’m using arguments
to hold a tuple of my arguments to my_range
. This allows me to send various combinations. The expected
argument is a list of what I would get from the iterator. You can see in the assert that I’m using list()
to consume the iterator for comparison.
In this test, I’m looking for functionality and not sending illegal count of arguments (that is the second test function.)
For test_my_range_operation
, pytest will call the function 8 times. Once for each parameter. I could write 8 separate calls or complete testing functions, but that would just be annoying and offer no benefit.
The test_my_range_bad_argument_counts
function is to assure that we provide a TypeError
unless argument count is between 1 and 3 inclusive. This uses the pytest.raises
context manager for holding code that generates the exception.
import pytest
@pytest.mark.parametrize("arguments,expected", [
((5, ), [0, 1, 2, 3, 4]),
((0, ), []),
((-3, ), []),
((0, 4), [0, 1, 2, 3]),
((3, 1), []),
((3, 6), [3, 4, 5]),
((0, 5, 1), [0, 1, 2, 3, 4]),
((5, 0, -1), [5, 4, 3, 2, 1])
])
def test_my_range_operation(arguments, expected):
# expanding arguments tuple to values for the call and consuming the iterator with list().
assert list(my_range(*arguments)) == expected
def test_my_range_bad_argument_counts():
with pytest.raises(TypeError):
my_range() # Requires more than 0 arguments
with pytest.raises(TypeError):
my_range(1, 2, 1, 5) # Requires less than 4 arguments
When we run these on a shell my_range
all will fail. Now we can code an implementation of my_range
that passes the tests.
my_range implementation
I’m using *args
and parsing, because the arguments change positions depending on number.
We start with raising TypeError
for argument len outside of 1-3.
Next we are initializing start
and step
to defaults, because we may never receive them. Then set step
if available. Set start
and end
if available, else just end
.
Our while
is a little different, because we need start > end
only if step
is positive. Otherwise, we need start < end
if step
is negative. This will also fall through is step is zero (instead of causing an infinite loop).
All that is left is to yield start
and then increment by step
.
def my_range(*args):
"""
Generate function to return a range of numbers
:param args: if one argument, use as upper limit with 0 start.
if two arguments, use as lower, upper limits.
if three arguments, use as lower, upper and step.
:yields: value
"""
arg_len = len(args)
if arg_len > 3:
raise TypeError('{} arguments given and a max of 3 are allowed.'.format(arg_len))
if arg_len == 0:
raise TypeError('No arguments given and at least 1 is required.')
start = 0
step = 1
if arg_len == 3:
step = args[2]
if len(args) > 1:
start = args[0]
end = args[1]
else:
end = args[0]
# Need to handle positive and negative steps
while (start < end and step > 0) or (start > end and step < 0):
yield start
start += step
This passes all our tests correctly.
I wondered how Python handles a step
of zero. When calling range(1, 4, 0)
I get a ValueError
stating that step can't be zero
, instead of our silent fail. That would be easy to add, but otherwise we seem to have a decent version of range
. And hopefully, you now know how iterators and generator function work.
Part 7 of 9 in the Python Tips series.
Series Start | Python: Context Managers | Python: Dunder name