This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

September 03, 2005

Unintuitive Multithreading: Communication Between Threads

This essay continues my exploration of misunderstandings about multithreading. In the first essay of this series, Unintuitive Multithreading: Speed, I explained that multithreading does not inherently speed up a process. In the second essay Unintuitive Multithreading: Waiting for Performance, I showed how to achieve better performance through waiting. In this essay, I plan to attack a common cause of reduced concurrency that foils most multithreaded projects. This is going to be a long one, so hold on.

In most of the disappointing multithreading projects that I have seen, the most common problem is that the programmer does not understand the concept of concurrency. Many programmers try to multithread code by haphazardly partitioning the code into separate threads without a clear plan. Then, they sprinkle mutexes around until the code appears to work. Programs built this way never work consistently and very rarely run much faster than the single-threaded version would have (or did).

A key concept of concurrent programs is that any communication between threads reduces concurrency. Said another way, multithreaded programs are most efficient when there is no communication between the threads. Now obviously, very few programs would be useful if there is no communication between the threads. If we want to be really precise, there should be as little synchronous communication between threads as possible. The more synchronous communication we have, the worse the performance will be. Whenever we have synchronous communications, we require one or more threads to wait for other threads.

Any time one thread may need to wait for another to finish some activity, the threads are said to synchronize. Obviously, when one or more threads are waiting for another thread to complete some action, they are not running concurrently with that thread. Although this statement is obvious, it is still worth pointing out because the larger the number of synchronization points in the code, the lower the overall potential concurrency in the program. This also explains why the common approach of scattering mutexes through code to make it multithreaded does not work well in practice.

Reduce Explicit Synchronization

To carry this point through a little further, we need to reduce synchronous communications between threads to increase concurrency. The standard Producer/Consumer model is a good place to start. The idea is to have Producer threads generating chunks of work that are placed in a protected queue. Now, one or more Consumer threads can take work items out of the queue and work on them. This asynchronous communication saves the Producers from waiting until a Consumers is ready. The only time Consumers will wait is when there is no work to do. The one synchronization point is the queue, itself. If the system is written correctly, putting something into the queue or taking something off the queue should not take much time, so no thread should have to wait for long on that operation. Other than this one point, there should be no further communication between the Producers and the Consumers. In addition, no two Consumers should communicate with each other.

You normally see across two special cases of the Producer/Consumer model. One has a single Producer thread and many Consumer threads. The other has many Producer threads and one Consumer thread. Depending on the amount of work each thread needs to do, both of these variations work quite well.

A common model uses a master thread to pass work to worker threads through a work queue. This approach works particularly well if the workers can complete their tasks independently of the master or the other workers. (Think web server.)

Watch for Implicit Synchronization

Possibly the most surprising form of communication between threads is implicit synchronization caused by system calls or library calls. The one that bites people the most often is probably memory allocation. Many memory allocation libraries have a single memory arena shared by all threads. Therefore, any time two threads allocate memory at the same time, one will have to wait. The library must synchronize access to the internal memory structures, otherwise the heap could be corrupted by two or more threads changing those structures at the same time.

There are a few ways to avoid this overhead. Some systems provide a special multithreaded memory allocator that keeps separate memory accounting for each separate thread. While this minimizes synchronization, it may waste a lot of memory. A second approach is to write your own thread-specific allocator for critical classes. This is relatively hard to do correctly. The final approach is to try to reduce the amount of dynamic allocation you do in the threads. For example, if most of your objects are built on the stack, you won't have a the synchronization problem. If you use the worker thread model above, you can strive to have the master thread do the allocations and have the worker threads focus on working with the allocated memory.

There is no single, ideal solution to the implicit synchronization problem. You will need to measure and examine your costs and benefits, and make the appropriate trade-offs.

If you are working in Java, the memory allocator is not really under your control. You also don't have the option of building objects on the stack. On the bright side, you can hope that the memory allocator has been written with threads in mind.

Don't Synchronize on Read-only Resources

Probably the most useless form of synchronization I am aware of is providing mutex protection for a read-only resource. If the resource does not change, no synchronization is needed. There is no way for the resource to be in an indeterminate state, because it does not change.

Let's say for the sake of argument that someone writes a program that uses a dictionary to convert keywords into a set of other words to use for further processing. If this dictionary is read from disk at program startup and never changes, then the various threads in the program can all access it at the same time without synchronization and there is no problem. As long as no thread ever modifies the dictionary when other threads could be using it, a mutex is a waste of time.

On the other hand, if this dictionary is updated by any thread, it will need some protection. If the updates are relatively infrequent compared to the number of reads, a read/write lock can provide a much smaller overhead for multiple readers and only really cost if a thread needs to write.

This shows one way to recognize resources that do not need to be protected from asynchronous accesses. If a resource is never written after the threads are started, it doesn't need protecting. If a resource is only infrequently written, protect it with a read/write lock. If a resource is written regularly by multiple threads, protect it with a mutex.

Summary

One of the biggest enemies of multithreading performance is synchronous communication between threads. Any time threads must synchronize to do their jobs they are not running concurrently. This reduces performance. This is the fundamental reason why putting mutexes throughout a program is not the way to make a multithreaded program.

Posted by GWade at September 3, 2005 11:42 PM. Email comments