Python Iterators, Generator, Yield, Coroutines
The yield statement suspends a function’s execution and sends a value back to the caller, but retains enough state to enable the function to resume where it left off. When the function resumes, it continues execution immediately after the last yield run. This allows its code to produce a series of values over time, rather than computing them at once and sending them back like a list. Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory. Yield is used in Python generators. A generator function is defined just like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function.
Iterables
When you create a list, you can read its items one by one. Reading its items one by one is called iteration:
mylist = [1, 2, 3]
for i in mylist: print(i)
mylist
is an iterable. When you use a list comprehension, you create a list, and so an iterable:
mylist = [x*x for x in range(3)]
for i in mylist: print(i)
Everything you can use "for... in...
" on is an iterable; lists
, strings
, files... These iterables can be read at any time and all their values are stored in memory.
Generators
Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:
mygenerator = (x*x for x in range(3))
for i in mygenerator: print(i)
Yield
yield
is a keyword that is used like return
, except the function will return a generator.
def create_generator():
mylist = range(3)
for i in mylist:
yield i*i
mygenerator = create_generator() # create a generator
print(mygenerator) # mygenerator is an object! <generator object create_generator at 0xb7555c34>
for i in mygenerator: print(i)
When you call the function, the written code in the function body does not run. The function only returns the generator object. The first time the for
calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield
,
then it'll return the first value of the loop. Then, each subsequent
call will run another iteration of the loop in the
function and return the next value. This will continue until the
generator is considered empty, which happens when the function runs
without hitting yield
. That can be because the loop has come to an end, an "if/else"
statement is no longer satisfied, StopIteration is raised...
Understanding the inner mechanisms of iteration
Iteration is a process implying iterables (implementing the __iter__()
method) and iterators (implementing the __next__()
method).
Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.
When you call a function that contains a yield
statement anywhere, you get a generator object, but no code runs. Then
each time you extract an object from the generator, Python executes code
in the function until it comes to a yield
statement, then pauses and delivers the object. When you extract another object, Python resumes just after the yield
and continues until it reaches another yield
(often the same one, but one iteration later). This continues until the
function runs off the end, at which point the generator is deemed
exhausted.
When you see a function with yield
statements, apply this easy trick to understand what will happen:
- Insert a line
result = []
at the start of the function. - Replace each
yield expr
withresult.append(expr)
. - Insert a line
return result
at the bottom of the function.
class countdown_iterator:
def __init__(self):
# Start at -1 so that we get 0 when we add 1 below.
self.count = -1
# The __iter__ method will be called once by the 'for' loop.
# The rest of the magic happens on the object returned by this method.
# In this case it is the object itself.
def __iter__(self):
return self
# The next method will be called repeatedly by the 'for' loop
# until it raises StopIteration.
def next(self):
self.count += 1
if self.count < 4:
return self.count
else:
# A StopIteration exception is raised
# to signal that the iterator is done.
# This is caught implicitly by the 'for' loop.
raise StopIteration
def some_func():
return
countdown_iterator
()
for i in some_func():
print i
he yield
keyword is reduced to two simple facts:
- If the compiler detects the
yield
keyword anywhere inside a function, that function no longer returns via thereturn
statement. Instead, it immediately returns a lazy "pending list" object called a generator - A generator is iterable. What is an iterable? It's anything like a
list
,set
,range
, dictionary view, or any other object with a built-in protocol for visiting each element in a certain order.
In a nutshell: Most commonly, a generator is a lazy, incrementally-pending list, and yield
statements allow you to use function notation to program the list values the generator should incrementally spit out. Furthermore, advanced usage lets you use generators as coroutines (see below).
Let's define a function makeRange
that's just like Python's range
. Calling makeRange(n)
RETURNS A GENERATOR:
def makeRange(n):
# return 0,1,2,...,n-1
i = 0
while i < n:
yield i
i += 1
>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>
To force the generator to immediately return its pending values, you can pass it into list()
(just like you could any iterable):
>>> list(makeRange(5))
[0, 1, 2, 3, 4]
The above example can be thought of as merely creating a list that you append to and return:
# return a list # # return a generator
def makeRange(n): # def makeRange(n):
"""return [0,1,2,...,n-1]""" # """return 0,1,2,...,n-1"""
TO_RETURN = [] #
i = 0 # i = 0
while i < n: # while i < n:
TO_RETURN += [i] # yield i
i += 1 # i += 1
return TO_RETURN #
>>> makeRange(5)
[0, 1, 2, 3, 4]
This is how the "Python iteration protocol" works. That is, what is going on when you do list(makeRange(5))
. This is described earlier as a "lazy, incremental list".
>>> x=iter(range(5))
>>> next(x) # calls x.__next__(); x.next() is deprecated
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
A coroutine (generators that generally accept input via the yield
keyword e.g. nextInput = yield nextOutput
,
as a form of two-way communication) is basically a computation that is
allowed to pause itself and request input (e.g. to what it should do
next). When the coroutine pauses itself (when the running coroutine
eventually hits a yield
keyword), the computation is paused
and control is inverted (yielded) back to the 'calling' function (the
frame which requested the next
value of the computation).
The paused generator/coroutine remains paused until another invoking
function (possibly a different function/context) requests the next value
to unpause it (usually passing input data to direct the paused logic
interior to the coroutine's code). One can think of Python coroutines as lazy
incrementally-pending lists, where the next element doesn't just depend
on the previous computation but also on input that you may opt to inject
during the generation process.
x = myRange(5)
list(x) #[0, 1, 2, 3, 4]
list(x) # []
A generator is an iterator; that is, it is one-time-use. If you want to reuse it, you should call myRange(...)
again. If you need to use the result twice, convert the result to a list and store it in a variable x = list(myRange(5))
.
- A function with
yield
, when called, returns a Generator. - Generators are iterators because they implement the iterator protocol, so you can iterate over them.
- A generator can also be sent information, making it conceptually a coroutine.
- In Python 3, you can delegate from one generator to another in both directions with
yield from
.
yield
is only legal inside of a function definition, and the inclusion of yield
in a function definition makes it return a generator. When the generator is called (methods are discussed below) execution resumes and then freezes at the next yield. The generator type is a sub-type of iterator. An Iterator
can't be reused or reset once exhausted. Another generator should be created to use its functionality again.
Coroutines:
yield
forms an expression that allows data to be sent into the generator.
The received
variable will point to the data that is sent to the generator:
def bank_account(deposited, interest_rate):
while True:
calculated_interest = interest_rate * deposited
received = yield calculated_interest
if received:
deposited += received
my_account = bank_account(1000, .05)
First, we must queue up the generator with the built-in function, next
. It will
call the appropriate next
.
first_year_interest = next(my_account)
first_year_interest
50.0
And now we can send data into the generator. (Sending None
is
the same as calling next
.) :
next_year_interest = my_account.send(first_year_interest + 1000)
next_year_interest
102.5
Cooperative Delegation to Sub-Coroutine with yield from. Other methods: close and throw. In a generator function, the return statement indicates that the generator is done and will cause StopIteration to be raised. The returned value (if any) is used as an argument to construct StopIteration and becomes the StopIteration.value attribute.
yield is similar to return - it returns whatever you tell it to (as a generator). The difference is that the next time you call the generator, execution starts from the last call to the yield statement. Unlike return, the stack frame is not cleaned up when a yield occurs, however control is transferred back to the caller, so its state will resume the next time the function is called.
Yield is single-pass: you can only iterate through once. When a function has a yield in it we call it a generator function. And an iterator is what it returns. Those terms are revealing. We lose the convenience of a container, but gain the power of a series that's computed as needed, and arbitrarily long.
Yield is lazy, it puts off computation. A function with a yield in it doesn't actually execute at all when you call it. It returns an iterator object that remembers where it left off. Each time you call next() on the iterator (this happens in a for-loop) execution inches forward to the next yield. return raises StopIteration and ends the series (this is the natural end of a for-loop).
Yield is versatile. Data doesn't have to be stored all together, it can be made available one at a time. It can be infinite. If you need multiple passes and the series isn't too long, just call list(generator).
Use send method inside a generator to send data back to the generator. To allow that, a (yield) is used.
iterator
is a more general concept: any object whose class has a __next__
method (next
in Python 2) and an __iter__
method that does return self
.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield
expressions (yield
statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator
.
You may want to use a custom iterator, rather than a generator, when
you need a class with somewhat complex state-maintaining behavior, or
want to expose other methods besides __next__
(and __iter__
and __init__
). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression)
is sufficient, and it's simpler to code because state maintenance
(within reasonable limits) is basically "done for you" by the frame
getting suspended and resumed.
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self):
return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares
you could easily offer extra methods, i.e.
def current(self):
return self.start
When type hinting a generator that only yields values can be annotated as an Iterator more simply (Generator[int, None, None]
=== Iterator[int]
)
next()
method is called for the first time, the function starts executing until it reaches a yield
statement which returns the yielded value. The yield
keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next()
call continues from the previous value. Generator functions are ordinary functions defined using yield
instead of return
. When called, a generator function returns a generator object, which is a kind of iterator - it has a next()
method. When you call next()
, the next value yielded by the generator function is returned.A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
A generator has a close
method, while typical iterators don’t. The close
method triggers a StopIteration
exception in the generator,
which may be caught in a finally
clause in that iterator, to get a chance to run some clean‑up.
This abstraction makes it most usable in the large than simple iterators.
One can close a generator as one could close a file, without having to bother about what’s underneath.
Besides, if you check the memory footprint, the generator takes much less memory,
as it doesn't need to store all the values in memory at the same time.
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__
method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__
and __next__
. The __next__
method is a function which supplies a new value. The __iter__
method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__
returns the object itself (typically return self
).
One iterable object is a list
object. It’s not an iterator, but it has an __iter__
method which returns an iterator. You can call this method directly as things.__iter__()
, or use iter(things)
.
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object. A generator object is like an iterator object in that it implements the iterator protocol. All generators are iterators but not vice versa.