Programmer Musings: Unintuitive Multithreading: Waiting for Performance

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

August 23, 2005

Unintuitive Multithreading: Waiting for Performance

This essay continues my exploration of misunderstandings about multithreading. In the first essay of this series, Unintuitive Multithreading: Speed, I explained that multithreading does not inherently speed up a process. In this essay, I plan to show how to not achieve more performance from a multithreaded system.

Many new multithreading (MT) developers make the same mistake when trying to speed up their first MT application. Assume a new MT programmer named George has just rewritten his program to use threads and finds that the performance is not much better than it was before threads. What is he going to do? George digs through the documentation for his threading library and finds that something called 'priority' determines which threads get executed before other threads. Obviously, he just needs to raise the priority of his most important thread and all will be well.

Unfortunately, George finds that this does not work as he expects. Sometimes his code executes a little faster, but sometimes it executes a lot slower. Once or twice it even locks up and has to be terminated externally. Like many first-timers, George immediately begins tinkering with the priorities of other threads. Each change results in similar inconsistent results. What could he possibly do to fix the problem?

Although George is not a real person, the example follows a progression I've seen several times. The original problem was either that the program was not well-suited to multithreading or George made a mistake partitioning the code into threads. Neither of these would be fixed by changing priorities.

After dealing with this problem time and time again, I've come to the conclusion that the first rule of thread priority is:

1. Don't change the priority of your threads.

Two likely results of changing thread priorities are priority inversion and thread starvation. Both occur because a thread has a higher priority than it should in the context of the surrounding program. This leads directly to the second rule of thread priority:

2. If you are absolutely sure that you know what you are doing and still want to raise a thread's priority, you are wrong.

If you think this rule is harsh, you haven't dealt with as many well-meaning, over-confident Georges as I have. In actuality, this rule is similar to the rule about avoiding hand-optimizing code. Modern compilers do a much better job of tweaks to improve the run-time of code than most programmers ever can hope to. Likewise, many modern threading systems perform minor tweaks to a thread's priority as needed. If you mess with a thread's priority, you will probably defeat that optimization.

The problem is that programmers are, in general, no better at recognizing priority issues than they are at recognizing the 10% of their code that consumes most of the running time. To make this point, I'll use an example we used to teach in one of my programming classes to explain this. Let's say I have a program that consists of three threads. All three threads must complete before the program is complete. At a high level, the threads are

Thread A does pure computations. It is just dependent on the CPU.
Thread B is writing data to the hard disk.
Thread C is writing reports directly to an old printer.

This is obviously contrived, but humor me so we can figure out how to set the priorities for these threads. Now George looks at this problem and decides that thread A is doing the most work and requires the CPU to do it, so it should have the highest priority. The printer will take forever to write anything, so he decides that thread C should be the lowest priority. That way no other thread will be waiting on it. That puts thread B as with the middle priority.

Once again, George has chosen his priorities in a way that makes some sense, but does not work well for a multithreading problem. In this particular case, we will generate the worst possible runtime. Thread A will run to completion with the least number of interruptions (and context switches). Then the program will spend a lot of time waiting and doing nothing as each of the other two threads slowly send out their data. Remember that the goal was to get all of the threads to complete as quickly as possible, not to get the calculation over with as fast as possible.

The key to this problem is to make the thread that does the least work in between blocking calls the highest priority. Multithreaded apps that make use of the hurry up and wait principle are often the best performers. So thread C needs to send individual bytes down to the printer and wait a long time for the printer to do it's thing. So we want to start on this task as soon as possible every time we can write. Thread B can write more to disk faster than our printer thread, so it should be lower priority than thread C. That way if both are ready to work, C will get a chance to start first and go back to waiting.

The lowest priority is now thread A. It will now be interrupted more and will therefore have a longer run-time. But, thread A's time will be interleaved with the waiting for the other two threads. So the overall performance is better. This is one of the surprising effects that makes MT worthwhile even though it slows down CPU-bound threads.

The bad news is that most MT problems are not this easy to characterize, so there may not be a clear-cut answer. The threading system can monitor the performance dynamically and optimize better than we could. This shows once again that we are better off leaving priorities to the threading system.

The third rule of thread priority is likely to confuse more people than the previous two combined.

3. If you must change a thread's priority, lower it to speed up the code.

Most modern threading systems will automatically raise the priority of a thread that spends most of its time waiting. They will also lower the priority of a thread that uses all of its time-slice. If your system does not do this, consider lowering the priority of any CPU-bound threads by a small amount to allow blocked threads to get run and go back to waiting. It is often surprising how this can produce better performance.

This will only really work if the MT code is structured correctly. We'll look into that issue a little more next time.

Posted by GWade at August 23, 2005 10:46 PM. Email comments