The golden age for programmers is over. For a decade we have been able to get away with writing slow code, knowing that the hardware would pick up the slack. Not so any more, hardware developers have decided that software developers need to raise their game, and get ready for a generation of multi-core processors.
What this means to programmers is simple, to make full use of the hardware code needs to be multithreaded. In this article I will give you a basic introduction to the concepts, tools and common pitfalls of multithreaded programming. Although I give examples in Java, the methods can be applied to any language that provides the tools discussed in this tutorial.
So just what does multithreaded code mean? It means your code is doing more than one thing at once. A thread is an operating system object that executes instructions, or your code. A normal sequential program consists of one thread that starts by running the main function of your program -- multiple threads mean you can execute code simultaneously.
For instance, your program might split a list of numbers to be summed into two halves, sum each half simultaneously, and then add the results to produce the final answer. This is achieved by spawning threads to sum each half of the list.
The benefit of using multiple threads is speed. With two threads summing each half of the list, the program would complete twice as fast on a computer that can execute two threads simultaneously, such as a dual-core processor. On computers with only one processor, the operating system pretends to run multiple threads at the same time by time-splicing, periodically switching between executing the different threads. Although it may not seem obvious that running threaded code on a single processor can allow speed boosts, it is the key to responsive I/O restrained programs. For instance, while one thread is busy copying files on a hard drive, another thread is listening for the user to hit the cancel button.
So multithreaded code can give you huge speed boosts, but threading is a dangerous tool which should be treated with care. The fact is writing multithreaded code is hard, all sorts of things can go wrong, and the bugs can often be almost impossible to find. Before you decide to write threaded code, you need to decide if it's worth it. There are two cases where it is worthwhile: when the task you are performing can be performed by a parallel algorithm; and when your program naturally is required to do two things at once, such as copying files and listening for the cancel button.
Creating Threads
Creating threads depends on the language. Object orientated languages such as Java often provide a Thread class to manage the execution of a thread, allowing the programmer to stop, start, join or interrupt.
A thread is created in Java by implementing the Runnable interface, which defines the run method. The Runnable object is then passed to the Thread class' constructor which executes a new thread starting in the Runnable object's run method. The join method of the Thread class will cause the calling thread to wait for the Thread class' contained thread to complete execution before continuing.
Below is the code for the "Hello World" of Java multithreaded programs. A thread is spawned that runs the Hello object's run method, the main thread waits for the Hello thread to complete before continuing.
public class Hello implements Runnable {
public void run() { // the method that will be run by the thread
System.out.println("Hello from a thread!");
}
public static void main(String args[]) {
Thread hello = new Thread(new Hello()); // create the hello Thread container
hello.start(); // start the hello thread
try { hello.join(); } // wait for hello thread to finish
catch (InterruptedException e) { }
}
}
Methods that cause a thread to sleep, such as join, can raise the Interrupted exception in Java, which indicates another thread has called the Thread objects interrupt method.
Race conditions
Most programs using multiple threads will need data to be shared among the threads. This might be a database, a language container or even just a simple counter. Accessing and modifying shared data can cause race conditions. Race conditions mean the result of your program can differ based on the different interleaving of the thread executions. The different orders of actions taken by threads might cause bad results, or even a crash.
Consider the following example of a blocking queue. A thread can add items to the queue with enqueue and remove items with dequeue. If the queue is empty when dequeue is called, the thread waits for another thread to put something in the queue.
class BlockingQueue {
List< Integer > queue;
public void Enqueue( Integer n ) {
queue.add( n );
}
public Integer Dequeue() {
while (queue.size() == 0) {} // wait for something
return queue.remove(0);
}
}
The code is simple enough, and at glance seems correct. However a specific ordering of events can cause it to crash. Imagine when the queue is of size one, two threads, A and B call the dequeue method. Among the many possible different interleaving of the two threads, the following two highlight the race condition:
A: while (queue.size() == 0)
A: return queue.remove(0)
B: while (queue.size() == 0)
B: while (queue.size() == 0)
A: while (queue.size() == 0)
B: while (queue.size() == 0)
A: return queue.remove(0)
B: return queue.remove(0)
In the first interleaving, thread A takes the only element on the queue, then thread B enters an infinite loop waiting for another thread to place a value in the queue. However, in the second interleaving, thread A sees that the queue is not empty so passes the loop condition, however, before A gets a chance to take the item off the queue, thread B kicks in and also passes the loop condition. Now both threads are past the loop condition and will both attempt to take an item off the queue, even though there is only one. This is a typical example of how a race condition can cause your program to crash or produce unexpected results.
Although the interleaving of the threads in the above example occurred on the code line level, in real life it will occur at the instruction level. Clearly, it is very dangerous for two threads to modify any shared data at the same time, even if it is just a single line incrementing a counter, as incrementing can actually be translated to multiple instructions, which when run, can be interleaved amongst the threads.
Luckily, almost all mainstream languages or their libraries provide tools to avoid race conditions. The three basic tools you should learn are locks, condition variables and monitors.
Do you need help with Java, C, or C++? 





1
Ilya - 20/08/08
Toby - thanks for the article.
In addition to the topics you're covering here, I'm wondering if it'd also make sense to explore performance considerations, and pros/cons of various approaches. For example, locks can "solve" data races - but can also destroy parallelism.
For what it's worth, for the C/C++ crowd, we at Cilk Arts recently put together this e-Book on multicore programming: http://www.cilk.com/multicore-e-book/
Cheers.
ilya
» Report offensive content
2
Daniel Chalo - 04/02/09
Toby, nice example.
I am working with both Java and C# (I need to keep my job... hard times).
There is a new book from Packt Publishing - C# 2008 and 2005 Threaded Programming: Beginner's Guide. http://www.packtpub.com/beginners-guide-for-C-sharp-2008-and-2005-threaded-programming/book
I bought the e-book version two weeks ago and it offers real-life examples and you can download the code. Many C# programmers are going to bless the examples. Threading has always been hard for me.
Best regards,
Daniel
From http://www.developmentnow.com/g/36_2005_2_0_0_222419/c-multiprocessor.htm
» Report offensive content