This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.
Last week, I wrote about a technique for debugging without using a debugger in Debugging Without a Debugger. I talked a bit about the advantages of instrumenting code and how it can be used to supplement the use of a debugger.
Those of you who have always used a debugger might consider this vaguely interesting, but not particularly critical. I first learned this technique in an environment where we could not use a debugger even if we had wanted to. Several times in my career I have have been in situations where a debugger was not available and this technique was the only way to debug code.
The first debugger-intolerant environment I developed in was writing a TSR program under DOS that communicated with a foreground program written by another company. The equivalent type of program would be daemon processes under various flavors of Unix or Windows services.
At the time, debuggers did not support the ability to attach to a running process. Since we couldn't start the process under the debugger, there was no way to run the program inside the debugger. Additionally, the program we were working on was communicating with a device that was not under our control. Stopping the program in the debugger would not have been an option even if we could have run under one. Which leads us to the next class of programs that do not work well with debuggers.
If the program has real-time requirements, running under a debugger may not be an option. Real-time does not always mean that it has to be lightning fast, but almost all real-time systems have requirements on the amount of time they are allowed to let certain operations wait. Stepping through code in a debugger or stopping on a breakpoint will almost certainly violate these requirements.
If the only real-time requirement is user responsiveness, this is not a real problem. But, if the program is communicating with another program or external device, pausing in the debugger might cause the whole application to fail or behave mysteriously. The TSR I talked about above communicated with another computer over a special interface card. If the card wasn't serviced in a timely fashion messages would be lost and the application would fail.
If the program has really tight, hard real-time requirements, even printing to a log might be too disruptive. In those cases, I've seen systems that log to a buffer in memory that is written to disk when the system has time.
Although there are a few debuggers on the market that deal with multi-threaded code, this kind of system plays havoc with debuggers. First of all, there is the question of what happens when a breakpoint is hit. Do we stop all threads or just the one? If we allow the other threads to continue, what happens if a second thread hits a breakpoint while we are looking at the breakpoint on the first thread?
If that isn't confusing enough, think about the changes to the timing of the interactions between thread. Race conditions may appear and disappear at random because of the interactions we are having with one or more threads. How does access to a shared object work when a thread changes the object we are inspecting in the debugger?
The next kind of system that I worked on without debugger help was a server in an on-line system. Like many on-line systems, this one had multiple threads to deal with incoming requests. There was a real-time component in the time required to service the request and respond to the client. If that weren't enough, the servers needed to stay running pretty close to 24/7. We could rotate servers into and out of service to load new code and fix problems, but they tended to run for hours, days, or weeks at a time.
A debugger is practically useless in this scenario. How do you watch a breakpoint that is only hit once every few hours? How do you catch problems that only occur on certain kinds of requests when you aren't sure which request triggers the problem?
In each of these scenarios, we found that by carefully instrumenting the code we were able to troubleshoot and solve problems despite the lack of a visual debugging environment. In some cases, we logged lots of information in the hopes of spotting the problem in the reams of collected data. In other cases, we put very specific instrumentation in place to catch the rare times when the problem occurred.
One benefit of this approach is the ability to bring the full power of your programming language to bear on recognizing a problem and logging the appropriate information. If we knew that the problem was related to a certain area of memory becoming corrupted, it was possible to make a function that tested that area of memory. Now, we can call the test at various points in the code and log when the error was detected. Most debuggers today support some form of conditional breakpoint. In most cases, though they support only counting or simple conditional expressions. If your debugger supports a condition based on a function call, the debugger can match this feature.
You can also easily write a test that saves earlier state of the program to compare with the current state to see when things change. For example, you might only want to log in the destructor of the ABC object that was created by function abc(), not the other hundred or so ABC objects in the system. If this information is not already in a variable, most debuggers could not track this change. Most debuggers support some method of testing if a small number of simple variables change.
With the ability to write arbitrarily complex tests and the ability to log anything that you can access from the code. Instrumenting the code is a very powerful technique.
Between this article and the last, I hope I've given you some reasons to consider troubleshooting without a debugger. Maybe the next time you find yourself bouncing on the step or next command in your debugger, you might consider a more automated way to troubleshoot.
Posted by GWade at December 1, 2007 03:19 PM. Email comments