Sometimes your program is just too motivated, and does all this work you don't need or want it to do -- you want it to be lazier. That's where generators come in. Using a generator in Python lets you choose exactly how much you want done, and when.

Last week we introduced you to list comprehensions, which give you a more natural way of representing the contents of a list. This article demonstrates their cousins: generators, which can build a sequence piece by piece, doing as much or as little as you need.

This is called lazy evaluation, which delays the computation of a particular value up until the point where it is needed by the program. Lazy evaluation is useful in many types of programming since it provides a performance benefit by eliminating unnecessary calculation of values that are never used. Some languages, such as Haskell and Miranda, use lazy evaluation by default, while others, like Scheme, OCaml and Python allow it through special syntax. In Python the way you achieve lazy evaluation is through generators.

A generator allows you to write a function that can return a result and pause, resuming in the same place the next time you call the function. You don't need any special syntax to write a generator, except to denote when an intermediate value is to be returned using the yield statement. The easiest way to demonstrate this is with an example:

>>> def generator1():
...     yield "first"
...     yield "second"
...     yield "third"
... 
>>> gen = generator1()
>>> gen

>>> gen.next()
'first'
>>> gen.next()
'second'
>>> gen.next()
'third'
>>> gen.next()
Traceback (most recent call last):
  File "", line 1, in ?
StopIteration
>>> 

In this example, first we define a generator called generator1 which will yield three values, the strings "first", "second" and "third". When we create a new generator object (gen) it begins the function. Each time you call the generator's next method, it continues the function until the next yield. When the generator reaches the end of the block, or reaches a return statement, it throws a StopIteration exception.

Generators are essentially iterators, which means they can be used instead of sequences (lists etc.) in many expressions. For example, rather than the above you could just as easily write:

>>> gen2 = generator1()
>>> for i in gen2:
...     print i
... 
first
second
third

Likewise, you can convert a generator into a list all at once:

>>> gen3 = generator1()
>>> list(gen3)
['first', 'second', 'third']

Let's take an example of where this is actually useful. The following is a fairly standard function that produces combinations, that is, the number of unique ways a sequence can be chopped into smaller sequences:

def combination(seq, length):
	if not length: return [[]]
	else:
		l = []
		for i in xrange(len(seq)):
			for result in combination(seq[i+1:], length-1):
				l += [[seq[i]]+result]
		return l

It works as follows:

>>> combination("ABCDE", 3)
[['A', 'B', 'C'], ['A', 'B', 'D'], ['A', 'B', 'E'], ['A', 'C', 'D'], ['A', 'C', 'E'], ['A', 'D', 'E'], ['B', 'C', 'D'], ['B', 'C', 'E'], ['B', 'D', 'E'], ['C', 'D', 'E']]
>>> combination("ABCDE", 2)
[['A', 'B'], ['A', 'C'], ['A', 'D'], ['A', 'E'], ['B', 'C'], ['B', 'D'], ['B', 'E'], ['C', 'D'], ['C', 'E'], ['D', 'E']]
>>> combination("ABCDE", 5)
[['A', 'B', 'C', 'D', 'E']]

Now let's turn it into a version that uses generators. The trick is simply to replace every point at which you would add a value to the end results list and replace it with a yield statement:

def xcombination(seq,length):
if not length: yield []
else:
for i in xrange(len(seq)):
for result in xcombination(seq[i+1:], length-1):
yield [seq[i]]+result

Now we have a version that works exactly the same, except that it does the work only as you need it:

>>> comb = xcombination("ABCDE", 3)
>>> comb.next()
['A', 'B', 'C']
>>> comb.next()
['A', 'B', 'D']
>>> list(comb)
[['A', 'B', 'E'], ['A', 'C', 'D'], ['A', 'C', 'E'], ['A', 'D', 'E'], ['B', 'C', 'D'], ['B', 'C', 'E'], ['B', 'D', 'E'], ['C', 'D', 'E']]
>>> comb2 = xcombination("ABCDE",2)
>>> for i in xrange(3):
... print comb2.next()
... ['A', 'B']
['A', 'C']
['A', 'D']

In the last command, even though there are 10 different combinations of letters, only three are generated, saving us 70 percent of the computation time of the standard function.

The really useful thing about generators is that they can, for most needs, be used as a drop in replacement for list comprehensions. All you need to do is replace the brackets that would go around list comprehensions with parentheses. Take the last example in the article on list comprehensions. Rather than:

>>> guests = ['Chris', 'Brendan', 'Jimmy', 'Mel', 'Mike', 'Jess']
>>> [(seat1, seat2) for seat1 in guests for seat2 in guests if seat1 != seat2]
...

You can write:

>>> guests = ['Chris', 'Brendan', 'Jimmy', 'Mel', 'Mike', 'Jess']
>>> ((seat1, seat2) for seat1 in guests for seat2 in guests if seat1 != seat2)

And then you can use it like you would any other generator:

>>> seating = ((seat1, seat2) for seat1 in guests for seat2 in guests if seat1 != seat2)
>>> for i in xrange(10):
... print seating.next()
...
('Chris', 'Brendan')
('Chris', 'Jimmy')
('Chris', 'Mel')
('Chris', 'Mike')
('Chris', 'Jess')
('Brendan', 'Chris')
('Brendan', 'Jimmy')
('Brendan', 'Mel')
('Brendan', 'Mike')
('Brendan', 'Jess')

Up til now we've been using generators as a way of building lists to save computation time over other methods. That's great, but where they really shine is when computing the whole list would be impossible. Take the Fibonacci sequence, in which each number is the sum of the previous two. Say we want a function that generates the sequence up to a given number:

>>> def fib(n):
...     a, b = 0, 1
...     l = [a]
...     while b < n:
...             l += [b]
...             a, b = b, a+b
...     return l
... 
>>> fib(20)
[0, 1, 1, 2, 3, 5, 8, 13]

Now this works fine, so long as we want to stop counting up when we reach a certain number. If we want to change the criteria for stopping, however, we have to rewrite the function. Instead we can implement it as a generator (implementation borrowed from the python PEP on generators):

>>> def xfib():
...     a,b = 0,1
...     while True:
...             yield b
...             a, b = b, a+b
... 
>>> fibseries = xfib()
>>> b = fibseries.next()
>>> while b < 20:
...     print b
...     b = fibseries.next()
... 
1
1
2
3
5
8
13

Or, if we want to stop at the first palindrome (greater than one digit) we just need to change the loop condition:

>>> fibseries = xfib()
>>> b = fibseries.next()
>>> while b < 10 or not list(str(b)) == list(reversed(str(b))):
... print b
... b = fibseries.next()
...
1
1
2
3
5
8
13
21
34

And that's it (The next value in the series is 55). By using a generator we can keep the list building implementation separate from the logic relating to when to stop generating it, while still only calculating as many values as we need.

When should you use generators over list comprehensions? Firstly, if you're going to use the whole list anyway, it's better to use a list comprehension -- they're faster since they don't have the added overhead of a generator function call. If you're going to use the first part of the list, use a generator, since you'll save CPU time.

Do you need help with Python? Gain advice from Builder AU forums

Comments

1

JMC - 11/07/07

You should have talked about generator expressions, as well. Now those are really neat (http://www.python.org/dev/peps/pep-0289/).

» Report offensive content

2

adam - 03/09/07

CHANGE for i in xrange(len(seq))
TO
for i in xrange(len(seq) - (length - 1))
<- this way the combine does no needless list building (when searching too long substrings)

» Report offensive content

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

2

adam - 09/03/07

CHANGE for i in xrange(len(seq)) TO for i in xrange(len(seq) - (length - 1)) <- this way the combine does no ... more

1

JMC - 07/11/07

You should have talked about generator expressions, as well. Now those are really neat (http://www.python.org/dev/peps/pep-0289/). ... more

Log in


Sign up | Forgot your password?

  • Staff Aussies to pay more for Win 7

    If you are looking to make some money in these troubled times, perhaps importing copies of Windows 7 could be for you. Read more »

    -- posted by Staff

  • Staff Firefox: Greens want it, 3.5rc2 not up to par

    This week's roundup looks at the situation surrounding a campaign to change Outlook HTML renderer, a Greens MP wants to install Firefox but is restricted and all the photos from the iPhone 3GS launch. Read more »

    -- posted by Staff

  • Chris Duckett Microsoft misses the Outlook point

    Ask designers which mail program is the bane of their existence, and you'll find that Outlook tops the list. The reason why the most popular email reader is also the most painful is simple: it uses Word to render HTML emails. Read more »

    -- posted by Chris Duckett

What's on?