The heap is an integral component in many algorithms -- a data structure that keeps elements organised so that it is always easy to find the smallest value. We'll show you how you can use the heapq module to implement heaps in Python in just a few lines of code.

Python has a module for implementing heaps on ordinary lists, allowing the creation of quick and easy priority queues for when you need a consistently sorted list between additions and removals. The heapq module defines functions for a minheap - which always returns the smallest item. To set up the heap, you add values using heappush and remove them using heappop:

>>> from heapq import heappush, heappop
>>> heap = []
>>> heappush(heap, 2)
>>> heappush(heap, 10)
>>> heappush(heap, 4)
>>> heappush(heap, 5)
>>> heappop(heap)
2
>>> heappop(heap)
4
>>> heappop(heap)
5
>>> heappop(heap)
10

If you have a list populated already you can turn it into a heap by using the heapify function from heapq:

>>> from heapq import heapify
>>> heap = [3,4,5,6,9,1,7,2,8]
>>> heapify(heap)
>>> heap
[1, 2, 3, 4, 9, 5, 7, 6, 8]
>>> heappop(heap)
1

It's important to note here that heapify does not sort the list. A heap, when implemented in a list, is closer to a flattened tree structure. In a heap, any element in the list heap[k] must be smaller or equal to the element heap[k*2 + 1] and heap[k*2 + 2]. The first item in the list is always the smallest. When it is removed, then the smaller of the second and third elements become the top, and the rest of the heap is adjusted.

If you're interested in more information on heaps, you should read the Wikipedia page.

Using a heap in this way (adding all the items at once and removing them all at once) has no benefits over simply sorting a list. The real benefit of priority queues come when you need to remove the minimum value in between adding values. Take the following example:

from random import randint
def minheap(n, Print=True):
    numbers = [randint(0,100) for i in range(n)]
    heapify(numbers)
    if Print:
        print "Original: %s" % (numbers)
    for i in range(n):
        newint = randint(0,100)
        heappush(numbers, newint)
        oldint = heappop(numbers)
        if Print:
            print "new number: %s, minimum: %s, heap: %s" % (newint, oldint, numbers)

>>> minheap(10)
Original: [8, 33, 59, 48, 46, 74, 70, 97, 54, 78]
new number: 59, minimum: 8, heap: [33, 46, 59, 48, 59, 74, 70, 97, 54, 78]
new number: 58, minimum: 33, heap: [46, 48, 59, 54, 58, 74, 70, 97, 59, 78]
new number: 30, minimum: 30, heap: [46, 48, 59, 54, 58, 74, 70, 97, 59, 78]
new number: 66, minimum: 46, heap: [48, 54, 59, 59, 58, 74, 70, 97, 66, 78]
new number: 54, minimum: 48, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78]
new number: 12, minimum: 12, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78]
new number: 13, minimum: 13, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78]
new number: 91, minimum: 54, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91]
new number: 31, minimum: 31, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91]
new number: 17, minimum: 17, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91]

To demonstrate that using a heap can be much faster, we'll write a version that keeps the list sorted using an insertion sort:

def minlist(n, Print=True):
	numbers = sorted([randint(0,100) for i in range(n)])
	if Print:
		print "Original: %s" % (numbers)
	for i in range(n):
		newint = randint(0,100)
		if newint >= numbers[n-1]:
			numbers.append(newint)
		else:
			for j in range(n):
				if newint < numbers[j]:
					numbers = numbers[:j] + [newint] + numbers[j:]
					break
		oldint = numbers.pop(0)
		if Print:
			print "new number: %s, minimum: %s, heap: %s" % (newint, oldint, numbers)

Now let's see how they do against each other once we increase the number of elements:

>>> i = time(); minlist(10000, False); print time()-i
11.6881940365
>>> i = time(); minheap(10000, False); print time()-i
0.142065048218
>>> i = time(); minheap(50000, False); print time()-i
0.621912002563
>>> i = time(); minheap(100000, False); print time()-i
1.33778500557
>>> i = time(); minheap(1000000, False); print time()-i
14.6926519871

Here you can see that the heap version can handle almost one hundred times as many numbers as the flat list based one and be the same speed.

If you need more control or more functionality, and some classical algorithms demand it, then you'll need to implement your own heap. For the basics though, the Python heapq module gives you what you need, with just a handful of functions to learn.

Do you need help with Python? Gain advice from Builder AU forums

Comments

1

Yakov Keselman - 30/10/08

One thing that should perhaps be mentioned in connection with Python heaps is that they let you store pairs (value, object). So that you can say "heappush(2, 'Drew'); heappush(1, 'Jane')" and get them back arranged by the numbers (not by the names). In this way, you can use heap to store objects by their priority, not just numbers.

» Report offensive content

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

1

Yakov Keselman - 30/10/08

One thing that should perhaps be mentioned in connection with Python heaps is that they let you store pairs (value, object). ... more

Log in


Sign up | Forgot your password?

  • Staff Microsoft shows off IE9 preview

    This week, highlights from Microsoft's MIX10 conference and more in the Roundup. Read more »

    -- posted by Staff

  • Chris Duckett IE9's H.264 vote killed Ogg

    In a split decision by the judges, the winner of the W3C/WHATWG video codec consensus is H.264, taking home the future of video playback on the internet while loser Ogg goes home with nothing but thoughts of what might have been. Read more »

    -- posted by Chris Duckett

  • Staff Google launches Apps Marketplace

    Google launches and app store, while Mozilla plans to re-write its open-source license. More of this week's news in the Roundup. Read more »

    -- posted by Staff

What's on?

  • Optus Deal

    Broadband + home phone + PlayStation®3 in a single package price!