Programmer Musings: Unintuitive Multithreading: Simpilicity

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

September 25, 2005

Unintuitive Multithreading: Simpilicity

Continuing my exploration of misunderstandings about multithreading, this essay is about simplicity. If you are interested in the previous essays in this series, they are listed below:

Unintuitive Multithreading: Speed explained that multithreading does not inherently speed up a process.
Unintuitive Multithreading: Waiting for Performance showed how to achieve better performance through waiting.
Unintuitive Multithreading: Communication Between Threads discussed a common cause of reduced concurrency in multihreaded code.

This time I plan to start with an assertion that almost everyone will agree with and push it more than most people would. The assertion is fairly obvious:

Simple multithreaded code works better than complex multithreaded code.

It seems that most programmers make code more complicated than it needs to be. Sometimes, we get a chance to simplify or refactor complex code; but more often it just continues to get more complicated. This is as true of single-threaded code as multithreaded code. Sometimes that complexity is necessary because the problem that we are solving is complex. Sometimes the complexity is just because we did not have the time to make it simpler.

With single-threaded code, you have a chance of bulling through the complexity and forcing it to work even if a simpler solution is possible. I contend that you do not have this luxury with multithreaded code. It is said that the biggest limitation in developing code is the limited size of developers' brains. We can only understand so much complexity. A multithreaded application is usually much more complex than an equivalent single-threaded application (ignoring for the moment that an equivalent program may not be possible). The problem is in the number of interactions between threads, both intended and unintended.

Given this complexity problem, the safest approach is to make each thread as simple as possible. One good approach is to make each thread perform only one function. This is the same approach to complexity we have been using to improve functions and objects for years, so it should not come as much of a shock. However, I have seen quite a few designs where people load functionality into individual threads until they are small applications of their own. The idea of a worker thread or a processing thread in a pipeline is to separate the functionality into bite-sized chunks. If the functionality of a given thread is simple enough, you can actually understand the interactions between it and the other threads.

Although there may be an application where some threads need to be extremely complex with extensive communications between threads, this is much less likely than people seem to believe. There are a handful of proven threading approaches. In general, you are probably safer using one of them than creating your own unstudied approach. In many cases, using one supervisor or master thread to farm out work to simpler threads is a good solution. The pipeline model also works well for jobs that work well in stages.

An Example

One example that I can remember was a variation of the worker thread approach. A master thread parceled out work to a small set of worker threads that it created. To reduce the overhead of creating threads, the master thread separated the work into a separate queue for each thread. The worker threads would then process the work in the queue and place the output on separate output queue for each thread. Another thread (one per worker) would copy the data from this output queue to the main output queue. When the worker thread finished it's queue it would go back to the master thread for more work.

Given N worker threads, this approach uses 2N+2 threads:

1 master thread
N worker threads
N output queue copy threads
1 output thread

In addition, this approach used 2N+2 queues.

1 master input queue, only touched by the master thread
N worker input queues
N worker output queues
1 output queue

Some of the unneeded complexity results from having two queues for each worker thread, one for input and one for output. Each of these queues needed to be maintained by the worker thread, making the worker thread more complicated. In addition, the master thread was now more complicated by the need for configuration options for determining how much of the work should be partitioned to each worker at a time. Additionally, we had a series of output threads, that just took results off the individual output queues and placed them on one master output queue.

The (potential) advantage is that the individual threads' input and output queues need not be synchronized. However, this advantage was nullified by the fact that the main thread became a bottleneck for loading input queues and the main output queue still needed to be synchronized.

A simpler approach would be to reduce the number of queues. We only need one work queue and one result queue. Each of these would need to be synchronized, of course, because multiple threads would manipulate them at the same time. The worker threads are now much simpler. They read from the work queue and write to the output queue. The threads that copied from the individual output queues to the master output queue are now gone. This leaves us with N+2 threads and 2 queues.

This design is less flexible, but it is much easier to understand and maintain. It also makes better use of concurrency. Each worker thread processes one piece of work from the queue at a time. If it finishes its work early, it can get a new chunk from the work queue. The only time a worker thread will not be able to get more work is when all of the work is done or in process. In the original approach, a thread could finish all of its work while the other threads still had items waiting in their queues. This thread is now left idle, even though there is work to be done.

The real irony of this, is that the original design was actually configured to only place one work item at a time in the individual work queues. This means that the extra flexibility was not being used for the case we examined. The extra mechanisms and flexibility made the code more complicated with absolutely no benefit.

Redundant Work

Another mistake I see in multithreaded code is in having multiple threads doing exactly the same work. I don't mean doing the same work on different data, I mean doing the same work. Obviously, if one thread repeats the work done by another thread we've reduced the efficiency of our application.

Asynchronous I/O is one place where I often see this problem. In most cases, I see it as unnecessary overhead because the user code ends up duplicating part of the work that the OS has already done. The OS

determines that a piece of I/O has completed,
finds the thread that has to process the work
and wakes up that thread.

The thread then

looks at the information it is responsible for
determines which of the outstanding I/O operations has now completed
processes the I/O

The first two steps just repeat work that the OS has already done.

While this approach is necessary in the OS kernel or in some embedded systems code, it seems to be redundant when dealing with user-level code. I have seen many arguments for asynchronous I/O in user programs, but I have not yet been convinced.

Summary

This essay turned out to be longer than I intended. Explaining simplicity is harder than it looks. As I said in the beginning,

Simple multithreaded code works better than complex multithreaded code.

Keeping this in mind and striving for simplicity in each thread will make your multithreaded code easier to get right and to maintain.

Posted by GWade at September 25, 2005 04:36 PM. Email comments