This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.
My wife is a really good writer and she once gave me a great piece of advice for some documentation I was working on: Know Your Audience. This has helped me to write better documentation. Over the years, I have come to realize that the same advice applies when writing code.
Most junior programmers only worry about getting the syntax right. After all, if the code compiles they are finished. Eventually, most programmers realize that the code is read by other people more than it is compiled. So writing code that is understandable to others is extremely important. In fact, all code is written for at least two audiences: the computer and any programmers that will read the code in the future.
The computer (actually either the compiler or an interpreter) is an easy audience to please. If the computer doesn't understand the code, it fails to compile. The human audience is much harder to reach effectively. This fact is what makes more senior programmers focus on details like consistency, the naming of variables and functions, and white space, where a more junior programmer might not.
In fact, knowing that the important audience for the code is another human is not enough. You need to have some understanding of what the reader of the code is trying to get from or do with the code. You also need to be aware of the level of expertise of the readers of the code.
Most junior programmers write the code for themselves if they write it for anyone. Some of that is because they don't know any other audience. The code could be written at any level of understanding, because the programmer involved could be writing at any level of expertise.
The biggest shock to this kind of developer is when they come back to some code months or years later and realize that they can't understand it. They are no longer the target audience. This can be a humbling experience.
I have worked on some projects where the coding standards basically state that any programmer walking in off the street should be able to read the code. No advanced language features are allowed. No advanced programming techniques are allowed. Jargon or project-specific terminology is discouraged.
Not long ago, the XKCD web comic did a wonderful example of what happens when you try too hard to simplify language. The Up Goer Five comic describes the Saturn V rocket using only the ten hundred most commonly used English words. (Note that thousand was not on the list, so...) This is a really good example in English of what most programmers see when they look at code written with the Random Person audience requirement.
I've actually had people go so far as to suggest that the code should be readable by someone who doesn't know the language. I'm really not sure what kind of constructive comment you can make in that case. The code equivalent here is to use the simplest language constructs, simple programming concepts, and as little project-specific terminology as possible. While this makes it possible for someone to read the code without much prior knowledge, it pretty much negates any prior knowledge the reader does have.
If the people reading the code can be expected to be experts at general programming or at least experts with the language in question, we can allow the code to contain more powerful language constructs. Meta-programming, functional techniques, and design patterns are all available.
A lack of domain or project knowledge may still constrain the jargon that can be used in this case. Good documentation, including explanatory inline commentary can help bridge these knowledge gaps. But, you can assume that the reader will understand the code itself. This allows for concise and powerful coding idioms that can result in a much smaller code base.
The reader could be an expert in the domain or project, but not necessarily an expert programmer. This audience would allow the code to be written more in the domain of the problem, but the code itself might not use more powerful idioms. This is often the goal if non-programmers will need to understand the system. Some literature refers to writing in the problem domain to cover this case.
The least restrictive audience would be one that has deep knowledge of programming concepts, the programming language in question, and the project itself. You would expect this audience to be able to understand large amounts of the code with only a cursory read. Their familiarity with the domain and deep knowledge of programming allows a very concise and elegant approach to writing the code.
One approach to choosing your audience is to select one kind of audience and write all code with them in mind. If the project only has a single programmer this often results in an audience of Me. That approach begins to fail as soon as you start adding more people to the project.
In the project has several people and a coding standard, then the single audience case normally aims for Random Person. The second case often seems attractive to the owners of the project. The main problem with this second approach is the fact that expertise on the part of the reader gives no benefit. If your code is always only read by junior programmers who are not on the project for long enough to develop any domain expertise, this might be a viable option. On the other hand, more experienced developers will probably leave to find somewhere that their skills are more needed.
With any long running project, there are likely to be examples of each of the audiences above working on the project or reading the code at any given point in time. If you don't allow at least some of the code to encode knowledge and expertise, your more senior people will become frustrated and leave. If your code is only written for the most experienced, bringing in new developers will be hard.
In a real project, different parts of the code probably have different audiences. The most central algorithms that are the core of the code may actually be complicated and require real expertise to understand and change. This code can be written in expecting a truly expert audience, because only someone who really understands the code should try to change it anyway. (A friend of mine described this as the you must be this tall to touch this code approach. Basically, if you can't understand why it was written this way, you may not be qualified to modify the code.)
Many systems provide an API for people outside the project to use. Obviously, this code should be written for an audience with much less knowledge of the system (and less of an expectation of general knowledge or expertise). Code written around the public interface for the system cannot assume extensive experience on the part of the reader. This code should probably default to simpler idioms.
Code in between these two extremes should take into account the level of developer you expect to need to work on that code. By providing code written at multiple levels of expertise, new developers can work on the simpler code as they develop the expertise needed to more to more advanced sections. Experts can work on the core code where their expertise can provide the most benefit.
Code will be read more than it will be compiled. A good programmer will give some thought about the audience of different pieces of the code and write accordingly.
In my last post (Thoughts on Code Comments), I described two opposing camps that argue about code comments. Like much of the craft of programming, the commenting of code is too complicated for a simple set of rules.
In this post, I will describe a set of guidelines that I've adopted over the years. These guidelines have been helpful both personally and have helped direct many more junior programmers.
Comments that describe what the code does are Implementation Comments. In theory, they allow anyone to read the comments and understand what the code is doing.
In practice, the person reading and maintaining the code should be able to read the code itself to tell what it is actually doing. So the comment is redundant and does that person no good. Anyone who cannot read the actual code to see what it is doing probably won't have the context needed to understand or change the code. Comments at implementation level probably won't teach them enough to actually change or understand the code. So, from the beginning, implementation comments are either redundant or not useful.
At some point, the code will be fixed to remove a problem and the comment will not be changed. (You can argue that the maintenance programmer should have updated the comment. I've never had the opportunity to code in the ideal world where that always happens.) Implementation level comments seem to be the most likely to diverge when the code changes.
Summary: At best, implementation comments are redundant; at worst, they are wrong or misleading.
While I can understand what a piece of code does by reading the code, I may not know what the programmer meant to do from reading the code. Commenting the intent (at a fairly high level) can go a long way to making the maintenance programmer's life easier.
For the programmer who can understand the code, an intent comment provides context. That makes evaluating and/or changing the actual code much easier. If someone is reading the comment that cannot understand the code, the intent may at least give them an idea whether they are looking at the right code.
Intent comments are usually not invalidated by code changes intended to fix bugs, since a bug normally means that the actual code did not match the intent. Intent comments can also supply context that the reader of the code needs to understand it.
Intent comments are almost always a block of text that explains a logic chunk of code that follows: a function, method, class, or tricky piece of data manipulation. It is possible for a single line comment to give intent, but that may be a sign that the code you are commenting deserves to be a subroutine with an intent-revealing name.
Summary: Intent comments are a message to the reader of the code to give context needed to understand the implementation.
Under some circumstances, solving a problem requires a tricky or unusual algorithm. (Not as often as less experienced programmers think, but it does happen.) This algorithm should be commented. This will be a cross between an intent comment and an implementation comment. You would probably give more implementation details (focusing on why this particular implementation is necessary) than would normally be appropriate when describing intent.
This simplest version of this comment names a known algorithm and references a source for more information. If you need to re-implement a standard algorithm, this would be the best approach. You might even provide information on why you chose this algorithm instead of another.
This kind of comment is a life-saver for the maintenance programmer. It helps avoid the circumstance where you fix an unusual piece of code only to subtly break some of its intended effect. This kind of comment also makes it possible to replace the algorithm with something more standard, if something is discovered or created at a later time. We've all seen code that has some weird effect that we compensate for all over the code, but no one is willing to change because we don't know why it does what it does.
Summary: A tricky algorithm should be commented for how it works and why it must be done this way. This reduces chances of accidentally changing behavior and maximizes the chances of replacing special code as understanding improves.
A variation of the Tricky Algorithm comment involves the favorite vice of many programmers: optimization. In the past, I've lumped Optimized Code in with Tricky Algorithms without making a distinction.
In recent years, I've had to modify my thinking after working with a programmer who is quite adept at optimizing. In some code reviews, I pointed out the cost of maintaining some optimizations and asked for justification of the changes. This normally resulted in some code profiling. Sometimes, the optimization was reverted after actual measurement showed little benefit. Other times, we got a wonderful comment explaining the reason for the optimization along with numbers proving the benefits.
As a result, I've come to suggest that any tricky code written for the sake of optimization should have actual performance numbers added that explain the benefits of the optimization. We know from plenty of sources that programmers are not very good at identifying actual bottlenecks that require optimization. This requirement forces the programmer to actually profile the code to document the benefit. The team can now reasonably evaluate whether or not the optimization is worthwhile. We might be willing to deal with some hairy code to give a 5X speed increase in code that is called a lot. On the other hand, a 5% speed increase in code that is not called often is probably not worth the cost.
Summary: You should profile before optimizing. So add the results of the profiling to the comment on the tricky algorithm you are using to optimize.
Over the last 15 years or so, one really strong push in programmer documentation is API documentation. This is normally automatically extracted from the code and reformatted into a nice web-based interface. The first version of this approach that most people are aware of is probably the Javadoc system introduced with Java. The idea was to generate more benefit for standardizing the class and method comments by automating a way of extracting the comments from the code and generate a nice API document.
Documenting the public API of a module or class is a great benefit to anyone using that module or class. API documentation should be written as more of an intent comment with the addition of inputs, outputs, and pre-conditions. This kind of comment can make using the code much easier.
Summary: API comments are necessary to using the code.
This is probably the hardest guideline to follow.
When any piece of code is changed, you need to read and understand any associated intent comments. If the code has changed to violate the original intent, the comment must be modified to match.
Any time the comments and the code diverge, the comments become worse than useless. Not only does the invalid comment make this code harder to understand and maintain, it undermines the use of every comment elsewhere in the system. Worse, it leads to a distrust of all comments, even those in other code.
Summary: Comments must be maintained.
In the comments on a recent Google+ post, a discussion erupted about comments in source code. As usual, there were two groups that argued relatively vehemently. The More Comments camp was best represented by the suggestion that a good program should have as many lines of comments as lines of code. The Clear Code Needs No Comments camp is best represented by Comments Are a Code Smell assertion.
This battle has been waged many times on many projects. As usual for these kinds of religious arguments, neither side is willing to budge and the other side is always wrong, pig-headed, naive, sloppy, or insert random slur here.<shrug/>
I've been a programmer for a long time. Most of that time has been spent maintaining other people's code. I rarely get to start a project from scratch, but I've had to live with the results of different programming decisions for years on almost every project I've worked on. Obviously, this experience colors how I see any topic, including this one.
Early in my career, I believed that more comments were a good thing. In my first professional programming position, I had to change that view. The previous programmer on the project (who was also my boss) had a tendency to use any excuse to add comments. If he went back to the code and didn't understand something, he added more comments. If he found a bug, he commented out the old code, added an explanation of why it didn't work and fixed the code. If he had a thought on how the code might change later, he added a comment. If he added debug code to understand something, he commented it out when he was finished, in case he needed it later.
Now, I know some of you will immediately point out that some of this would have been fixed if we has been using git to store information about older code and maybe a wiki for ideas about future changes. I would agree. But, you need to understand that this was at the end of the 1980s. Version control was pretty primitive. (I eventually got the code on RCS, so I could remove the history from the source. Subversion had not been invented yet, much less git.) The only way we had for sharing code was sneaker-net. You don't want to know about code merges. So, the world was quite different from today.
Anyway, this experience taught me that more comments are not necessarily better. At the same time, I was reading programming books and articles by Donald Knuth, P.J. Plauger, Jon Bentley, Jeff Duntemann and others that made me think a lot about code comments. Over the next couple of decades, I see-sawed between more comments and less comments as the ideal. I tried to write code that was clear without any comments. I dealt with both entry-level and advanced programmers. I cleaned up code written by Wade-past. I tried to leave information for Wade-future. In the end, my views became an combination of what I read and saw.
The biggest problem I have with the More Comments camp is the underlying assumption that the comments will be good quality. Honestly, if the original programmer wrote hard-to-understand code, what makes us think he or she will write good quality, easy-to-understand comments? My experience has been that a lot of cryptic code is the result of fuzzy thinking. If someone slaps together a little code and then tweaks it until it works, why would they write comments that make any more sense? If this person does leave any comments, they normally tell me nothing that I can't already get from the code.
When discussing the issues, proponents of this view normally compare an ideal situation with perfect comments to badly written code with no comments and immediately focus on the lack of comments. Anyone who has maintained code for any length of time has run into bad or misleading comments. The problem with an incorrect comment is that it will probably bias your thinking about the code, which makes troubleshooting or changing the code harder. It's been my experience that good comments are like gold, but most comments are something else entirely.
This argument is based on the idea that you should be thinking about the next person to read your code as you write it. Clear naming of variables and functions is a necessity. Structuring the code and data to match the domain helps the reader understand what is going on. If you write the code clearly, the reader will understand the glorious solution you've provided.
Unfortunately, most code is not this clear. It may be unclear because the programmer did not understand the problem or is coding beyond their ability. The code is probably the result of the same fuzzy thinking I referenced in the previous section. It may be that the code has been patched to the point that the original intent can no longer be seen.
The proponents of this view assume that the programmer fully understands the problem and solution when they have finished writing the code, so at that point it should be clear. They also assume that the programmer will take the time to clarify the code before declaring it complete.
In my experience, even the best case scenario leads to code that may not be perfectly clear. At the point that you are writing the code you have a lot of context in your head that the next person won't (necessarily) have. This means that what is clear to you may not be clear to the reader of the code. While a comment explaining the code may be a Code Smell suggesting the code is not clear enough, it may be just the context you need when reading the code.
Any discussion of commenting code really should include some mention of Donald Knuth's Literate Programming. The Wikipedia article gives a pretty good overview, but Knuth's book (Literate Programming (Center for the Study of Language and Information - Lecture Notes)) does the best job. The idea is basically to provide a system where the programmer defines the program in a text or essay form with embedded code that implements the logic. Two separate programs are used to convert this format into either documentation for humans or source code for the compiler.
Literate Programming never really caught on as a paradigm. Many have suggested that this approach requires programmers to be able to both write coherent software and to write coherent English (or whatever natural language) at the same time. This skill seems to be beyond most of us.
As with almost anything relating to software, this topic is much more complicated and subtle than the competing camps suggest. So I've decided to focus this post on describing the problem. In my next post (Code Comment Guidelines), I will describe the guidelines I use to decide if and how much to comment.