The groupby function is useful for a range of needs, but one of the best uses for it is in replicating the UNIX filter uniq in Python.

Last week we looked at the itertools module and how its iterator based functions can be faster than doing things from scratch. One of the examples showed a function using itertools.takewhile to be much faster than one using itertools.groupby. Some feedback we got, asked, what then is the point of groupby if it's just a slower version of other functionality?

Groupby is useful for all kinds of things, if you're only going to be able to remember one function in itertools, groupby is the one that's going to let you get the most done.

For example, the compress function from the previous article can be written simply as:

def compress(l):
	return [a for a,b in groupby(l)]

It's not the fastest, but if your data set isn't huge it may just be quick enough. We tested it on a list of two million numbers and the difference was less than a minute.

Likewise the pack function is just as simple:

def pack(l):
	return [list(b) for a,b in groupby(l)]

Let's just remind ourselves about how these work:

>>> import random
>>> from itertools import *
>>> x = [random.randint(0,5) for x in range(20)]
>>> x
[3, 5, 1, 2, 4, 4, 4, 1, 4, 5, 2, 5, 3, 4, 5, 0, 5, 0, 2, 1]
>>> [(a,list(b)) for a,b in groupby(x)]
[(3, [3]), (5, [5]), (1, [1]), (2, [2]), (4, [4, 4, 4]), (1, [1]), (4, [4]), (5, [5]), (2, [2]), (5, [5]), (3, [3]), (4, [4]), (5, [5]), (0, [0]), (5, [5]), (0, [0]), (2, [2]), (1, [1])]
>>> compress(x)
[3, 5, 1, 2, 4, 1, 4, 5, 2, 5, 3, 4, 5, 0, 5, 0, 2, 1]
>>> pack(x)
[[3], [5], [1], [2], [4, 4, 4], [1], [4], [5], [2], [5], [3], [4], [5], [0], [5], [0], [2], [1]]

If you're familiar with the terminal, you may notice that this is very similar to the unix "uniq" command, which works just like compress on lines of text. One of the most common things to do with uniq is combine it with sort, and we can do that here as well:

>>> [a for a,b in groupby(sorted(x))]
[0, 1, 2, 3, 4, 5]
>>> [list(b) for a,b in groupby(sorted(x))]
[[0, 0], [1, 1, 1], [2, 2, 2], [3, 3], [4, 4, 4, 4, 4], [5, 5, 5, 5, 5]]

Another common use for uniq is to count the number of occurrences of each item in the list, but using "sort | uniq -c". You can do that with groupby as well:

>>> [(a,len(list(b))) for a,b in groupby(sorted(x))]
[(0, 2), (1, 3), (2, 3), (3, 2), (4, 5), (5, 5)]

If you need that info in a dictionary -- say a mapping between object and number of occurrences, you just need to wrap that list in a dict() object:

>>> dict([(a,len(list(b))) for a,b in groupby(sorted(x))])
{0: 2, 1: 3, 2: 3, 3: 2, 4: 5, 5: 5}

Last item on the agenda for today, returning only the duplicates in a list -- just like "uniq -d". We're using list comprehensions, so we can use the filtering inbuilt to only take groups with more than one element:

>>> [a for a,b in groupby(x) if len(list(b)) > 1]
[4]

Writing a uniq substitute is just one of the things you can do with groupby. Check out the Python library documentation for more ideas.

Correction: The original version was missing return statements in the compress and pack functions. Thanks for pointing this out readers.

Do you need help with Python? Gain advice from Builder AU forums

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

Log in


Sign up | Forgot your password?

  • Staff Microsoft shows off IE9 preview

    This week, highlights from Microsoft's MIX10 conference and more in the Roundup. Read more »

    -- posted by Staff

  • Chris Duckett IE9's H.264 vote killed Ogg

    In a split decision by the judges, the winner of the W3C/WHATWG video codec consensus is H.264, taking home the future of video playback on the internet while loser Ogg goes home with nothing but thoughts of what might have been. Read more »

    -- posted by Chris Duckett

  • Staff Google launches Apps Marketplace

    Google launches and app store, while Mozilla plans to re-write its open-source license. More of this week's news in the Roundup. Read more »

    -- posted by Staff

What's on?

  • Optus Deal

    Broadband + home phone + PlayStation®3 in a single package price!