Thursday, October 26, 2023

Python Iterators, Generator, Yield, Coroutines

Python Iterators, Generator, Yield, Coroutines

The yield statement suspends a function’s execution and sends a value back to the caller, but retains enough state to enable the function to resume where it left off. When the function resumes, it continues execution immediately after the last yield run. This allows its code to produce a series of values over time, rather than computing them at once and sending them back like a list. Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory. Yield is used in Python generators. A generator function is defined just like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function.

Iterables

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:

mylist = [1, 2, 3]
for i in mylist: print(i)

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

mylist = [x*x for x in range(3)]
for i in mylist: print(i)

Everything you can use "for... in..." on is an iterable; lists, strings, files... These iterables can be read at any time and all their values are stored in memory.

Generators

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

mygenerator = (x*x for x in range(3))
for i in mygenerator: print(i)

Yield

yield is a keyword that is used like return, except the function will return a generator.

def create_generator():
    mylist = range(3)
    for i in mylist:
       yield i*i

mygenerator = create_generator() # create a generator
print(mygenerator) # mygenerator is an object! <generator object create_generator at 0xb7555c34>
for i in mygenerator: print(i)

When you call the function, the written code in the function body does not run. The function only returns the generator object. The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each subsequent call will run another iteration of the loop in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, an "if/else" statement is no longer satisfied, StopIteration is raised...

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method). Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

When you call a function that contains a yield statement anywhere, you get a generator object, but no code runs. Then each time you extract an object from the generator, Python executes code in the function until it comes to a yield statement, then pauses and delivers the object. When you extract another object, Python resumes just after the yield and continues until it reaches another yield (often the same one, but one iteration later). This continues until the function runs off the end, at which point the generator is deemed exhausted.

When you see a function with yield statements, apply this easy trick to understand what will happen:

  1. Insert a line result = [] at the start of the function.
  2. Replace each yield expr with result.append(expr).
  3. Insert a line return result at the bottom of the function. 
class countdown_iterator:
    def __init__(self):
        # Start at -1 so that we get 0 when we add 1 below.
        self.count = -1

    # The __iter__ method will be called once by the 'for' loop.
    # The rest of the magic happens on the object returned by this method.
    # In this case it is the object itself.
    def __iter__(self):
        return self

    # The next method will be called repeatedly by the 'for' loop
    # until it raises StopIteration.
    def next(self):
        self.count += 1
        if self.count < 4:
            return self.count
        else:
            # A StopIteration exception is raised
            # to signal that the iterator is done.
            # This is caught implicitly by the 'for' loop.
            raise StopIteration

def some_func():
    return countdown_iterator()

for i in some_func():
    print i

 he yield keyword is reduced to two simple facts:

  1. If the compiler detects the yield keyword anywhere inside a function, that function no longer returns via the return statement. Instead, it immediately returns a lazy "pending list" object called a generator
  2. A generator is iterable. What is an iterable? It's anything like a list, set, range, dictionary view, or any other object with a built-in protocol for visiting each element in a certain order.

In a nutshell: Most commonly, a generator is a lazy, incrementally-pending list, and yield statements allow you to use function notation to program the list values the generator should incrementally spit out. Furthermore, advanced usage lets you use generators as coroutines (see below).

Let's define a function makeRange that's just like Python's range. Calling makeRange(n) RETURNS A GENERATOR:

def makeRange(n):
    # return 0,1,2,...,n-1
    i = 0
    while i < n:
        yield i
        i += 1

>>> makeRange(5)
<generator object makeRange at 0x19e4aa0>

To force the generator to immediately return its pending values, you can pass it into list() (just like you could any iterable):

>>> list(makeRange(5))
[0, 1, 2, 3, 4

The above example can be thought of as merely creating a list that you append to and return:

# return a list                  #  # return a generator
def makeRange(n):                #  def makeRange(n):
    """return [0,1,2,...,n-1]""" #      """return 0,1,2,...,n-1"""
    TO_RETURN = []               # 
    i = 0                        #      i = 0
    while i < n:                 #      while i < n:
        TO_RETURN += [i]         #          yield i
        i += 1                   #          i += 1
    return TO_RETURN             # 

>>> makeRange(5)
[0, 1, 2, 3, 4]

This is how the "Python iteration protocol" works. That is, what is going on when you do list(makeRange(5)). This is described earlier as a "lazy, incremental list".

>>> x=iter(range(5))
>>> next(x)  # calls x.__next__(); x.next() is deprecated
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> next(x)
4
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration 

A coroutine (generators that generally accept input via the yield keyword e.g. nextInput = yield nextOutput, as a form of two-way communication) is basically a computation that is allowed to pause itself and request input (e.g. to what it should do next). When the coroutine pauses itself (when the running coroutine eventually hits a yield keyword), the computation is paused and control is inverted (yielded) back to the 'calling' function (the frame which requested the next value of the computation). The paused generator/coroutine remains paused until another invoking function (possibly a different function/context) requests the next value to unpause it (usually passing input data to direct the paused logic interior to the coroutine's code). One can think of Python coroutines as lazy incrementally-pending lists, where the next element doesn't just depend on the previous computation but also on input that you may opt to inject during the generation process.

x = myRange(5)

list(x) #[0, 1, 2, 3, 4]
list(x) # []

A generator is an iterator; that is, it is one-time-use. If you want to reuse it, you should call myRange(...) again. If you need to use the result twice, convert the result to a list and store it in a variable x = list(myRange(5)).

  • A function with yield, when called, returns a Generator.
  • Generators are iterators because they implement the iterator protocol, so you can iterate over them.
  • A generator can also be sent information, making it conceptually a coroutine.
  • In Python 3, you can delegate from one generator to another in both directions with yield from.

yield is only legal inside of a function definition, and the inclusion of yield in a function definition makes it return a generator. When the generator is called (methods are discussed below) execution resumes and then freezes at the next yield. The generator type is a sub-type of iterator. An Iterator can't be reused or reset once exhausted. Another generator should be created to use its functionality again.

Coroutines:

yield forms an expression that allows data to be sent into the generator.

The received variable will point to the data that is sent to the generator:

def bank_account(deposited, interest_rate):
    while True:
        calculated_interest = interest_rate * deposited 
        received = yield calculated_interest
        if received:
            deposited += received

my_account = bank_account(1000, .05)

First, we must queue up the generator with the built-in function, next. It will call the appropriate next.

first_year_interest = next(my_account)
first_year_interest
50.0

And now we can send data into the generator. (Sending None is the same as calling next.) :

next_year_interest = my_account.send(first_year_interest + 1000)
next_year_interest
102.5

Cooperative Delegation to Sub-Coroutine with yield from. Other methods: close and throw. In a generator function, the return statement indicates that the generator is done and will cause StopIteration to be raised. The returned value (if any) is used as an argument to construct StopIteration and becomes the StopIteration.value attribute.

yield is similar to return - it returns whatever you tell it to (as a generator). The difference is that the next time you call the generator, execution starts from the last call to the yield statement. Unlike return, the stack frame is not cleaned up when a yield occurs, however control is transferred back to the caller, so its state will resume the next time the function is called.

Yield is single-pass: you can only iterate through once. When a function has a yield in it we call it a generator function. And an iterator is what it returns. Those terms are revealing. We lose the convenience of a container, but gain the power of a series that's computed as needed, and arbitrarily long.

Yield is lazy, it puts off computation. A function with a yield in it doesn't actually execute at all when you call it. It returns an iterator object that remembers where it left off. Each time you call next() on the iterator (this happens in a for-loop) execution inches forward to the next yield. return raises StopIteration and ends the series (this is the natural end of a for-loop).

Yield is versatile. Data doesn't have to be stored all together, it can be made available one at a time. It can be infinite. If you need multiple passes and the series isn't too long, just call list(generator).

Use send method inside a generator to send data back to the generator. To allow that, a (yield) is used.

iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.

Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.

You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.

def squares(start, stop):
    for i in range(start, stop):
        yield i * i

generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b)) 

 would take more code to build as a custom iterator:

class Squares(object):
    def __init__(self, start, stop):
       self.start = start
       self.stop = stop

    def __iter__(self): 
        return self

    def __next__(self): # next in Python 2
       if self.start >= self.stop:
           raise StopIteration
       current = self.start * self.start
       self.start += 1
       return current

iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
    return self.start 

When type hinting a generator that only yields values can be annotated as an Iterator more simply (Generator[int, None, None] === Iterator[int])

 When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value. Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
A generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator,
which may be caught in a finally  clause in that iterator, to get a chance to run some clean‑up.
This abstraction makes it most usable in the large than simple iterators.
One can close a generator as one could close a file, without having to bother about what’s underneath.
Besides, if you check the memory footprint, the generator takes much less memory,
as it doesn't need to store all the values in memory at the same time. 
def myGeneratorList(n):
    for i in range(n):
        yield i

def myIterableList(n):
    ll = n*[None]
    for i in range(n):
        ll[i] = i
    return ll

# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
    print("{} {}".format(i1, i2))

# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.

An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).

One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).

Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object. A generator object is like an iterator object in that it implements the iterator protocol. All generators are iterators but not vice versa.