This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.
This is a very simple concept that seems to escape many people. In fact, most of the multi-threaded code I've seen shows a distinct misunderstanding of this basic fact. Very often people seem to think that a mutex should protect a section of code. I think this misconception may have been caused by the term critical section or critical region.
In my computer science courses, the concept of a critical region was used to describe a section of the code that must not be interrupted. Somewhat later, this concept was weakened to say that we did not want some certain other sections of code to execute at the same time as this one. In some forms of programming like embedded systems, real-time systems, and some kernel development, you really have sections of code that are timing sensitive and cannot be interrupted. Most of the rest of us don't really have those kinds of limitations.
In most cases, what we really need is a way to prevent two threads from performing conflicting operations on some resource. The aim is the same, but the focus is different. This change in focus helps solve many of the mutex-related problems I have seen in multi-threaded code.
Other than just forgetting to unlock a locked mutex, almost all of the mutex-related problems I've seen fall into three major categories:
Each of these bugs can be traced directly back to not connecting the mutexes with a resource. The first problem is usually caused by locking code that changes a resource without properly locking the code that reads the resource. I've also seen two (or more) mutexes used to lock different forms of access to the resource. This makes sense if you are preventing a particular piece of code from executing in two threads at once, but it does not properly protect the resource.
The second problem is often shown by either acquiring too many mutexes or attempting to execute a large amount of code without interruption by holding a mutex. Another symptom of this problem is acquiring the mutex in one function/method and then having to remember to call another function to release it. This is often caused by trying to protect code instead of a resource.
The last problem is particularly a problem for people who only vaguely understand what goes on in a multi-threaded program. These programmers start out by adding a mutex to prevent some problem that they are seeing and acquiring and releasing that mutex in various places until the program seems to work. When another problem comes along, they repeat this "successful" strategy to solve the new problem. Eventually, the code is peppered with mutexes, locks, and unlocks, and no one can predict how it will work.
In trying to solve these problems, you normally have to apply some relatively simple rules to your use of mutexes.
By focusing on the resource instead of the code that we don't want to interrupt, a simpler set of rules arises.
The first important point is that only shared resources need protection. If the resource is never accessed from more than one thread, it doesn't need this kind of protection. The second important point is that access to the shared resource should be controlled through a small number of methods. This allows you to control the acquisition/release of the mutex much more carefully. If you can't control access to the shared resource and must have acquire/release code scattered around, then you need to revisit your design.
In my experience, the simpler the use of shared resources and their associated mutexes, the more likely you are to get them right. In single-threaded code, a simpler design and implementation is more likely to be correct. Simplicity is even more important when multiple threads are involved. Using mutexes only to protect resources seems to generate a simpler design.
Posted by GWade at December 1, 2004 08:28 AM. Email comments