Programmer Musings: Thoughts on Code Comments

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

June 21, 2013

Thoughts on Code Comments

In the comments on a recent Google+ post, a discussion erupted about comments in source code. As usual, there were two groups that argued relatively vehemently. The More Comments camp was best represented by the suggestion that a good program should have as many lines of comments as lines of code. The Clear Code Needs No Comments camp is best represented by Comments Are a Code Smell assertion.

This battle has been waged many times on many projects. As usual for these kinds of religious arguments, neither side is willing to budge and the other side is always wrong, pig-headed, naive, sloppy, or insert random slur here.<shrug/>

My History with Code Comments

I've been a programmer for a long time. Most of that time has been spent maintaining other people's code. I rarely get to start a project from scratch, but I've had to live with the results of different programming decisions for years on almost every project I've worked on. Obviously, this experience colors how I see any topic, including this one.

Early in my career, I believed that more comments were a good thing. In my first professional programming position, I had to change that view. The previous programmer on the project (who was also my boss) had a tendency to use any excuse to add comments. If he went back to the code and didn't understand something, he added more comments. If he found a bug, he commented out the old code, added an explanation of why it didn't work and fixed the code. If he had a thought on how the code might change later, he added a comment. If he added debug code to understand something, he commented it out when he was finished, in case he needed it later.

Now, I know some of you will immediately point out that some of this would have been fixed if we has been using git to store information about older code and maybe a wiki for ideas about future changes. I would agree. But, you need to understand that this was at the end of the 1980s. Version control was pretty primitive. (I eventually got the code on RCS, so I could remove the history from the source. Subversion had not been invented yet, much less git.) The only way we had for sharing code was sneaker-net. You don't want to know about code merges. So, the world was quite different from today.

Anyway, this experience taught me that more comments are not necessarily better. At the same time, I was reading programming books and articles by Donald Knuth, P.J. Plauger, Jon Bentley, Jeff Duntemann and others that made me think a lot about code comments. Over the next couple of decades, I see-sawed between more comments and less comments as the ideal. I tried to write code that was clear without any comments. I dealt with both entry-level and advanced programmers. I cleaned up code written by Wade-past. I tried to leave information for Wade-future. In the end, my views became an combination of what I read and saw.

Flaws in the More Comments Argument

The biggest problem I have with the More Comments camp is the underlying assumption that the comments will be good quality. Honestly, if the original programmer wrote hard-to-understand code, what makes us think he or she will write good quality, easy-to-understand comments? My experience has been that a lot of cryptic code is the result of fuzzy thinking. If someone slaps together a little code and then tweaks it until it works, why would they write comments that make any more sense? If this person does leave any comments, they normally tell me nothing that I can't already get from the code.

When discussing the issues, proponents of this view normally compare an ideal situation with perfect comments to badly written code with no comments and immediately focus on the lack of comments. Anyone who has maintained code for any length of time has run into bad or misleading comments. The problem with an incorrect comment is that it will probably bias your thinking about the code, which makes troubleshooting or changing the code harder. It's been my experience that good comments are like gold, but most comments are something else entirely.

Flaws in the Clear Code Needs No Comments Argument

This argument is based on the idea that you should be thinking about the next person to read your code as you write it. Clear naming of variables and functions is a necessity. Structuring the code and data to match the domain helps the reader understand what is going on. If you write the code clearly, the reader will understand the glorious solution you've provided.

Unfortunately, most code is not this clear. It may be unclear because the programmer did not understand the problem or is coding beyond their ability. The code is probably the result of the same fuzzy thinking I referenced in the previous section. It may be that the code has been patched to the point that the original intent can no longer be seen.

The proponents of this view assume that the programmer fully understands the problem and solution when they have finished writing the code, so at that point it should be clear. They also assume that the programmer will take the time to clarify the code before declaring it complete.

In my experience, even the best case scenario leads to code that may not be perfectly clear. At the point that you are writing the code you have a lot of context in your head that the next person won't (necessarily) have. This means that what is clear to you may not be clear to the reader of the code. While a comment explaining the code may be a Code Smell suggesting the code is not clear enough, it may be just the context you need when reading the code.

Literate Programming

Any discussion of commenting code really should include some mention of Donald Knuth's Literate Programming. The Wikipedia article gives a pretty good overview, but Knuth's book (Literate Programming (Center for the Study of Language and Information - Lecture Notes)) does the best job. The idea is basically to provide a system where the programmer defines the program in a text or essay form with embedded code that implements the logic. Two separate programs are used to convert this format into either documentation for humans or source code for the compiler.

Literate Programming never really caught on as a paradigm. Many have suggested that this approach requires programmers to be able to both write coherent software and to write coherent English (or whatever natural language) at the same time. This skill seems to be beyond most of us.

Conclusion

As with almost anything relating to software, this topic is much more complicated and subtle than the competing camps suggest. So I've decided to focus this post on describing the problem. In my next post (Code Comment Guidelines), I will describe the guidelines I use to decide if and how much to comment.

Posted by GWade at June 21, 2013 08:20 AM. Email comments