Programmer Musings: Diff Debugging

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

December 28, 2005

Diff Debugging

Every now and then, I manage to pull myself away from reading and reviewing computer books (my hobby for the last year) or programming (my hobby for ... never mind), and spend a little time on various weblogs. It's important to see what the names in the field are thinking and talking about.

During one of these late night blog-reading sessions, I ran across MF Bliki: DiffDebugging. This turned out to be a wonderful find, not because the technique was new to me (I've been using it almost as long as I've used version control), but because he gave the technique a useful name.

Fowler explains it much better than I do, of course, but the technique is great for finding regressions in a code base. If you detect a bug in some code you are working on that you know worked before, use your version control software (VCS) to find a version that did not manifest the bug. Then, try to find the last version that did not have the bug and the first version that did. (These should be adjacent versions.) By only looking at what changed between these two versions, it may be possible to find the cause of the bug much more quickly.

I have often used the technique with dates instead of individual commits. This is the easier approach if your VCS does not support atomic commits (CVS, RCS, etc.). If your VCS does support atomic commits (Subversion, etc.), you can test individual commits, which may simplify focusing in on the change.

If you have good unit tests for your software, diff debugging is even faster. You can just run the unit test which shows the bug for each version that you check out. Success and failure are much easier to detect. However, formal unit tests are not required to use diff debugging, since I was using the technique long before I became test-infected. Automated tests do make the technique much easier, though.

One important point that Fowler does not make is that a binary search technique can be pretty useful if the range of versions is large. If you know the code worked a month ago and it doesn't work now, a good approach would be:

Check out the code from a month ago and verify that it actually worked.
Check out the code from two weeks ago and see if it worked.
- If it works, check out code from a week later to see if it works.
- If it doesn't, check out code from a week earlier to see if it works.

Repeat this technique with shorter time scales until you find the offending version. Of course, when you are reduced to a handful of versions, it is easier to just step through them (as in Fowler's example). I have often needed to the binary search technique in cases where a standard build and smoke-test procedure was not in place.

The most interesting part of Fowler's blog entry was the fact that he created a good name for this technique. If you don't have a name for something, it is really hard to talk about. It is also not as obvious a tool in your toolkit. By giving the technique a name, Fowler has improved my programming skills by turning an ad hoc technique I have applied in certain situations into a known tool in my programming toolkit. The technique has not changed, but the name makes it a more reusable tool.

This is an interesting fact about our field: concepts are our best tools. Naming a concept gives you the power to use it. Many of the most important breakthroughs in the past few decades have not been new techniques or algorithms, but the naming of techniques so that we can discuss them and reuse them.

Posted by GWade at December 28, 2005 03:47 PM. Email comments