The heap is an integral component in many algorithms -- a data structure that keeps elements organised so that it is always easy to find the smallest value. We'll show you how you can use the heapq module to implement heaps in Python in just a few lines of code.

Python has a module for implementing heaps on ordinary lists, allowing the creation of quick and easy priority queues for when you need a consistently sorted list between additions and removals. The heapq module defines functions for a minheap - which always returns the smallest item. To set up the heap, you add values using heappush and remove them using heappop:

>>> from heapq import heappush, heappop
>>> heap = []
>>> heappush(heap, 2)
>>> heappush(heap, 10)
>>> heappush(heap, 4)
>>> heappush(heap, 5)
>>> heappop(heap)
2
>>> heappop(heap)
4
>>> heappop(heap)
5
>>> heappop(heap)
10

If you have a list populated already you can turn it into a heap by using the heapify function from heapq:

>>> from heapq import heapify
>>> heap = [3,4,5,6,9,1,7,2,8]
>>> heapify(heap)
>>> heap
[1, 2, 3, 4, 9, 5, 7, 6, 8]
>>> heappop(heap)
1

It's important to note here that heapify does not sort the list. A heap, when implemented in a list, is closer to a flattened tree structure. In a heap, any element in the list heap[k] must be smaller or equal to the element heap[k*2 + 1] and heap[k*2 + 2]. The first item in the list is always the smallest. When it is removed, then the smaller of the second and third elements become the top, and the rest of the heap is adjusted.

If you're interested in more information on heaps, you should read the Wikipedia page.

Using a heap in this way (adding all the items at once and removing them all at once) has no benefits over simply sorting a list. The real benefit of priority queues come when you need to remove the minimum value in between adding values. Take the following example:

from random import randint
def minheap(n, Print=True):
    numbers = [randint(0,100) for i in range(n)]
    heapify(numbers)
    if Print:
        print "Original: %s" % (numbers)
    for i in range(n):
        newint = randint(0,100)
        heappush(numbers, newint)
        oldint = heappop(numbers)
        if Print:
            print "new number: %s, minimum: %s, heap: %s" % (newint, oldint, numbers)

>>> minheap(10)
Original: [8, 33, 59, 48, 46, 74, 70, 97, 54, 78]
new number: 59, minimum: 8, heap: [33, 46, 59, 48, 59, 74, 70, 97, 54, 78]
new number: 58, minimum: 33, heap: [46, 48, 59, 54, 58, 74, 70, 97, 59, 78]
new number: 30, minimum: 30, heap: [46, 48, 59, 54, 58, 74, 70, 97, 59, 78]
new number: 66, minimum: 46, heap: [48, 54, 59, 59, 58, 74, 70, 97, 66, 78]
new number: 54, minimum: 48, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78]
new number: 12, minimum: 12, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78]
new number: 13, minimum: 13, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78]
new number: 91, minimum: 54, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91]
new number: 31, minimum: 31, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91]
new number: 17, minimum: 17, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91]

To demonstrate that using a heap can be much faster, we'll write a version that keeps the list sorted using an insertion sort:

def minlist(n, Print=True):
	numbers = sorted([randint(0,100) for i in range(n)])
	if Print:
		print "Original: %s" % (numbers)
	for i in range(n):
		newint = randint(0,100)
		if newint >= numbers[n-1]:
			numbers.append(newint)
		else:
			for j in range(n):
				if newint < numbers[j]:
					numbers = numbers[:j] + [newint] + numbers[j:]
					break
		oldint = numbers.pop(0)
		if Print:
			print "new number: %s, minimum: %s, heap: %s" % (newint, oldint, numbers)

Now let's see how they do against each other once we increase the number of elements:

>>> i = time(); minlist(10000, False); print time()-i
11.6881940365
>>> i = time(); minheap(10000, False); print time()-i
0.142065048218
>>> i = time(); minheap(50000, False); print time()-i
0.621912002563
>>> i = time(); minheap(100000, False); print time()-i
1.33778500557
>>> i = time(); minheap(1000000, False); print time()-i
14.6926519871

Here you can see that the heap version can handle almost one hundred times as many numbers as the flat list based one and be the same speed.

If you need more control or more functionality, and some classical algorithms demand it, then you'll need to implement your own heap. For the basics though, the Python heapq module gives you what you need, with just a handful of functions to learn.

Do you need help with Python? Gain advice from Builder AU forums

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

Log in


Sign up | Forgot your password?

  • Staff XP stays on life support for longer

    This week's Roundup looks at Microsoft's decision to extend the life of Windows XP, the release of Microsoft Surface SDK, Firefox's new Geode plug-in, Yahoo's new tool -- Smush It and more. Read more »

    -- posted by Staff

  • Chris Duckett The good and truly awful celluloid depictions of computers

    Ever wonder why your lawyer uncle leaves the room whenever you turn over to Boston Legal? Or why your forensic science cousin can't stand crime drama? You know the answer: it’s the horrid trivialisation and dumbing down of an occupation to make it appear entertaining. Sometimes it is so unbelievable that it actually hurts and yelling at the screen is the only outlet. Read more »

    -- posted by Chris Duckett

  • Brendon Chase Apple's iPhone engineers to tour Sydney, Melbourne

    Aussie developers will be able to get up close and personal with some of the iPhone engineers in November to learn how to build applications for the platform. Read more »

    -- posted by Brendon Chase

What's on?