Programmer Musings: July 2013 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

July 28, 2013

Coding Style: Terse vs Verbose, Conclusion

In the last post (Coding Style: Terse vs Verbose), I laid out some of the arguments for and against a verbose coding style and a terse coding style. Although, I didn't delve deeply into what I meant by these styles, I hoped the details were less important than the general feel.

However, by the end of the post, I had not answered the one very important question.

Which Way is Best?

Like most decisions in programming, this one involves trade-offs. The key to the answer is in a previous entry (Write for Your Audience). Odds are, neither of the two extremes is the right way. Depending on the programmers writing and maintaining your code, the team will need to make different decisions.

Team Makeup

If your programmers are all new to the language and the program and always will be, write as verbose as you can. An example of this would be a program written and maintained by a stream of interns or short term employees. If the skill level of the maintainers never progresses, they need to keep things obvious.

Another case where the verbose style is helpful is a system maintained by one or more non-programmers as a side project. In this case, the extra verbosity will help them get oriented when they come into the code. Since they are not intending to become experienced programmers, they have less incentive to become well-versed in the intricacies of the language.

If, on the other hand, your code is maintained by experienced programmers that are working on this system as their primary jobs, a terser style is probably more efficient. Although it may make life a bit harder for new developers coming in, no one is the new guy forever. If a terser style is more readable to most of your programmers over most of the time that they work on the project, you are better off adopting that style.

In any case, the style should fit the audience that will read and maintain the code.

Trade-offs

Almost every interesting problem in programming has trade-offs. There is no one, true answer.

If your experts will read your code much more often than novices, feel free to allow a more terse style. If people with less experience will read the code, making it somewhat more verbose may be the right choice.

It is probably not wise to make any of the code too terse or too verbose. You want people to be able to become experts eventually, so the learning curve can't be too steep. Likewise, you don't want code that is so verbose that it ends up dumbed down to the point that no one wants to work on it.

Most of your code will be maintained by people between the two extremes, so the verbosity level of the code will also need to be between the extremes.

An approach that I have found to be useful is to find a style that most of your team is somewhat comfortable with. The really junior people will find it a little terse. Your senior people will see it as a little verbose. That should be the goal for most of the code. Where necessary, your team can write really performance critical or seriously advanced code more tersely than this standard. Only experienced people will work on it anyway. Your team should write any code that is likely to be read outside the core programming team more verbosely than the standard.

The resulting style will not be consistent across the whole code base. But, it will be more comfortable for the people who work in each area.

Myths

One of the most important points to realize is that either extreme is bad. Since this is a polarizing issue, each side quotes myths about either style that support their own argument.

If a little verbosity makes the code more readable to new people, more verbosity must make it even more readable, obviously.

For an example of extreme verbosity, we could have code that takes 100 lines (taking into account comments and such) to increment a single variable. This is obviously not better than a terser version. Depending on the code, one line or a couple of lines might be a little verbose, but perfectly clear.

For any given audience, there is a sweet spot between maximum terseness and maximum verbosity that generates the best clarity. This spot is very hard to find and maintain. I've worked with a number of people that think it is better to err on the side of verbosity. By itself, that is not unreasonable.

However, if you keep going in that direction, you quickly reach a point where everyone has to wade through huge amounts of extraneous stuff to understand the code. At this point, your experts are no more effective than a random developer walking in off the street. Unfortunately, this kind of least common denominator code seems to chase off good developers.

Succinct code is more cryptic and harder to understand.

Succinct code, by its nature, relies on other context for clarification. Opponents of a terse style will sometimes point to a single line or statement and declare that it is unreadable. The fallacy is that you almost never look at only one line of code. This would be like taking the single sentence "She was confused when he did that" from of a story and declaring this as proof that pronouns make English unreadable. The rest of the story (or maybe just the page or paragraph) would make she, he and that perfectly clear.

Using an overly verbose style can be a lot like trying to write or speak without using any pronouns, contractions, or idioms of any kind.

Sometimes, the supposed cryptic code is just using a specialized vocabulary. Someone unfamiliar with that vocabulary will find it hard to understand until they learn the terminology.

Verbose code obscures the really necessary parts, making the code harder to understand

While extremely verbose code can make finding the important bits difficult. Sometimes a small increase in verbosity can increase clarity for less knowledgeable developers, without a major impact on the expert developers. A good example is careful naming of variables and functions. The variable speed_of_sound is likely to be easier to follow than SoS.

Verbose code is more debuggable

Many people believe that the only way to troubleshoot a problem is to run it through a debugger. For those people, verbose code can be better because it gives more possibilities for breakpoints. For example, we could always use very simple named sub-expressions, instead of long mathematical equations. That way we could print individual pieces or single-step through a calculation carefully.

Let's look at some code in C.


    float first_solution( float first_coefficient, float second_coefficient, float third_coefficient )
    {
          float square_of_second = second_coefficient ** 2;
          float product_first_third = 4 * first_coefficient * third_coefficient;
          float double_first = 2 * first_coefficient;
          float major_term = sqrt( square_of_second - product_first_third );
          float numerator = - second_coefficient + major_term;
          return numerator / double_first;
    }

It definitely has many pieces that can be checked independently. There are many places for breakpoints. But, the overall structure of the code is pretty obtuse. Compare with the following example, that carries the context from when you learned the quadratic equation sometime in the past.


    float first_solution( float a, float b, float c )
    {
          return (-b + sqrt( b**2 - 4*a*c )) / (2 * a);
    }

It's pretty easy to verify this equation against your memory or a math textbook. There's really no need to debug it. At a glance, I'd say the biggest problem I can see with the second form is the use of floats instead of doubles. This gives a potential for lose of necessary precision in some cases. The same problem applies to the other code, but is not as obvious. Instead you are drowning in lots of text and this issue becomes harder to see.

In this case, the succinct code is clearer because we are familiar with the formula. Just as importantly, despite the extremely short parameters names, it is still readable because of the context of the standard representation of this formula.

Correct is more important than debuggable.

It's a fairly reasonable argument that if the code is actually correct, there is no reason to ever debug it. Unfortunately, that relies on a rather large if. How do you know that the code is correct? More importantly, 3 years from now when the environment where this code is running has become completely different, is the code still correct? And how would you know?

Some of the most interesting bugs I've seen in my career have been in code that was obviously correct.

Verbose code generates more lines of code, so it can't be correct.

The bugs per lines of code argument is an interesting one. The most often quoted form of this is from Steve McConnell's book Code Complete, Second Edition.

Industry average experience is about 1-25 errors per 1000 lines of code for delivered software.

Later in the book, McConnell shows that this defect rate increases as the code base gets bigger.

Unfortunately, this myth is based on a really important logical mistake. As an example, if I took a working 10 line program and added 10,000 copies of the line x=x; (assuming x is available), would that automatically imply that the code has acquired 10-250 defects? The only real effect would be potential performance impact, unless you are using a modern compiler which would optimize this code away.

This does not mean that adding code won't increase the number of bugs. We just can't say there will be more bugs simply because there are more lines of code.

Verbose code will reduce the learning curve for new programmers.

This myth rests on two assumptions.The first assumption is that verbose code is necessarily clearer than a more concise version. As we saw in the quadratic equation example above, that may not be true. The other assumption is that we actually care to make things easier for the new programmer.

If a programmer is on a particular project or with a company for 5 years (as an example), and they are effectively new for 6 months, that means that we get a benefit from the verbosity for 10% of the career of that programmer in this code. Any downsides of the extra verbosity will effect the remaining 90% of that programmers career in the code. More importantly, it will impact 100% of the time of any senior developers in the code.

In my experience, the more senior people are often quite a bit more productive than the junior people. (Any junior people who are more productive, usually don't remain junior for long.)

In summary, more verbose code may not actually be clearer. At best, will only help some of your people for a short while. If there is any downside at all with more verbose code with respect to your more senior people, it sounds like a bad trade-off.

We only hire the top 5%, they will be able to understand our style.

Everybody says this. By definition, most companies cannot be hiring just the top 5%. Since, by definition, most companies are paying average salaries, have average benefits, and are working on average problems, the top 5% probably aren't even applying to your company. Most of your programmers are probably around the average for your area.

So, writing overly terse code intentionally, because you employ the best of the best, may not be the best strategy.

The code has to be this cryptic for performance reasons.

One real cause of cryptic code is micro-optimizations for performance. While micro-optimizations can improve performance in some cases, they almost always make the code more difficult to read. Unfortunately, programmers are truly awful at recognizing performance bottlenecks without careful profiling. In many cases, the optimized code was just not performance critical, in the first place. This problem is known throughout the field as premature optimization.

It's usually better to write code clearly until you have actual numbers proving that the code is a performance problem. Even then, exploring better algorithms, written clearly may be a better use of time.

Since this code isn't performance-critical, we can make it more verbose without penalty.

In the book C++ Coding Standards, this is referred to as premature pessimization. We can think about this problem by using the rule that 20% of the code controls 80% of the performance. So we can make the most benefit by performance tuning that 20%. But, if we make the other 80% of the code consistently slower it will still affect 20% of the performance, which can add up.

In one system I worked on, I saw a new piece of code that was written in a sub-optimal fashion (O(n³) to be exact). When I suggested we rewrite the code more efficiently, I was told this wasn't a performance critical piece of code and n would never get very big anyway. A couple of weeks later, I was asked for help because the system was way too slow. This pessimized code turned out to be the problem.

Conclusion

The verbosity of your coding style is another facet of the know your audience issue. Rather than aiming only for verbose or terse code, you should focus on the right style for the audience of that code.

As usual, there is no simple, obvious, correct answer. The style of your code is a complex issue that depends heavily on who will see the code.

References

Code Complete: A Practical Handbook of Software Construction, Second Edition - Steve McConnell
Programming Pearls - Jon Bentley
C++ Coding Standards: 101 Rules, Guidelines, and Best Practices - Sutter and Alexandrescu
ratio of bugs per line of code - Dan Mayer
Portrait of a N00b - Steve Yegge
Code's Worst Enemy - Steve Yegge

Posted by GWade at 12:46 PM. Email comments

July 17, 2013

Coding Style: Terse vs Verbose

One coding style issues that causes many arguments is the distinction between the Terse/Elegant/Succinct style and the Verbose/Legible/Debuggable style. As usual, each side explains that their style is the only rational choice and that the other approach is obviously wrong. First, let's look at the two styles to see what they say. In order to avoid showing too much bias towards either style, I'm going to refer to both using accurate but slightly derogatory terms: Terse and Verbose.

Terse Tim

Let's take Tim, a hypothetical programmer of the Terse style, and see how he explains the style.

PM: Tim, why do you advocate the terse style of coding?

Tim: Because it is obviously correct. Good code style is uses the minimum syntax necessary to get the point across. Anything else is unnecessary and gets in the way of working with the code.

PM: One thing people complain about in terse code is short variable names. Why not use longer names?

Tim: Names should be no longer than necessary. Some incorrect styles require that all names of subroutines and variables and such be long. The longer the better. This is ludicrous.

A short name is just as clear in context. If you don't understand the code well enough to understand the context, a longer name on one variable is not going to fix it.

Longer names get in the way of troubleshooting code. Code that has too many long names ends up with most of them being similar: source_user_name_for_copy and source_user_name_for_compare. What a crock!

PM: Couldn't you just use comments to clarify the differences?

Tim: Oh, yeah. Like that will help. Most comments are wrong. The ones that aren't out and out wrong are usually out of date. The more comments in the code, the more likely that most of them are misleading. I'd prefer no comments to misleading comments.

In the few cases where the comments are actually correct, they probably just repeat what the code says. So they are redundant.

Since you can't trust the comments, you have to read the code to see what it does. The code can't lie. The computer actually compiles and executes the code. There are no such checks on the comments.

PM: Doesn't this make the code harder to understand for new people?

Tim: Obviously, someone who has never seen the code before will need to learn the system before being proficient. But, that's true no matter how you write the code.

The problem with writing code to the least common denominator is that your advanced programmers can't use their knowledge and skills to advance the code and the team. Everyone is forced to work as if all of the other developers are clueless.

PM: What is your position on advanced language features?

Tim: Many advanced features are great! You can do a lot more with less code. What's not to like?

PM: What about people who say the advanced features are too hard to understand?

Tim: If it's in the language, it should be available for use. More advanced features usually reduce lines of code. Less code is easier to maintain. If you haven't learned the more advanced features, maybe you should avoid advanced code until you've learned more.

PM: Aren't more advanced features harder to debug?

Tim: Actually, the more advanced features often leave less need for debugging. Several studies have reported that the number of bugs per line of code is relatively consistent across the industry, independent of language. Higher level languages are more productive, partially because they require fewer lines for a given piece of functionality. That translates to fewer bugs per feature.

If we are using higher level constructs in a system, that allows us to write more functionality with fewer lines, which translates directly to fewer bugs.

PM: How do you feel about optional syntax?

Tim: If it's optional, leave it out. I don't see how adding extra code junk helps anyone. People who get used to adding optional syntax just in case often end up adding useless code that just slows the system down.

For example, if a variable contains a string, there is no need to interpolate that variable before using it as a string. In most cases, that creates a new copy of the string unnecessarily.

Since extra syntax, at best, has no effect on how the program works, and, at worst, can cause a performance penalty, why would anyone use it?

Verbose Vinnie

Let's talk with another hypothetical programmer, Vinnie, for an explanation of the Verbose style of programming.

PM: Vinnie, why do you advocate a verbose style of coding?

Vinnie: It's all about readability, understanding, and debugging. If you are coming to the code with a less-than-perfect understanding, more detail is better. This applies whether you are new to the codebase or just haven't looked at this particular piece of code in a few months. A little more explicit information helps you orient yourself to the code and begin working sooner.

Consistently using a more explicit style makes all of the code easier to understand for those who will maintain it.

PM: Doesn't that constrain your advanced programmers to dumbing down their code to the level of a beginner?

Vinnie: The advanced programmers are the ones with the knowledge we want to capture. When the experts encode their knowledge in comments and good naming, more junior programmers can contribute without having the whole system in their heads already.

Sometimes expert programmers want to write more advanced, cooler code. But, that is not where the business value lies. On any given system, there are only a few experts and the rest of team is intermediate or junior. The verbose style accommodates most of the programming staff to make the best use of the team, as a whole.

PM: What about the use of terser, more powerful language constructs? Shouldn't we use the power of the language when possible?

Vinnie: Given two constructs that will do the same job, we should consistently use the clearer, easier-to-understand construct. This makes the code easier to understand for the whole team and any new members. If some of the code uses a clearer construct and other code uses a high-power, cryptic construct, people will have a harder time understanding the code as a whole.

Consistency is much more important than the desire to show off obscure coding practices.

PM: What is your position on optional syntax?

Vinnie: There is no reason to remove optional syntax. Leaving off the optional syntax obviously generates some ambiguity. Did the programmer mean to leave it off? Was the programmer really aware of the implications of leaving it out?

For example, leaving out the parentheses in a mathematical equation can cause loads of grief. Did the original programmer understand the precedence rules correctly? (If he didn't, there could be a bug.) Will everyone that maintains the code understand? Someone coming back later and clarifying by putting in the parentheses could break the code if they misunderstand the precedence.

Using the explicit syntax, even if it is optional, keeps everyone on the same page. No need to guess about the intent of the code.

In addition, more explicit styles often simplify debugging.

PM: How is that?

Except in simple cases like optional parentheses, optional syntax usually removes lines of code. Think of C's block if versus the statement if. The only thing the statement if gives us is the ability to drop the curly braces in the case that the if only applies to a single statement. The block if can do a single statement in the block just as well.

But the block gives us more opportunity for debugging. We can easily add a print statement to the block to see what is going on. Some debuggers may not allow you as many options for breakpoints if the statement is all on one line. Obviously, the difficulty debugging is much more of a downside than the saving of 2 characters when you are writing the code.

Arguments Against Each Style

Now that we've explained the high points of each style, we should really consider the complaints that people make against each of the styles.

Arguments Against the Terse Style

The main argument against the terse style is that it fails if all of your developers are not at similar levels of skill and comfort with the code. Often managers will argue with this style because they are not as familiar with the code as their programmers and they can easily see how the terseness hurts their ability to read the code.

People also say that this style can cause the learning curve for a new developer to be longer. They need to learn the idioms of this group and become comfortable with the level of code that the team is writing.

As for the higher-level, denser constructs, a less experienced programmer will definitely need to learn how these constructs work. Until they do, code using those constructs is going to be hard to understand.

Arguments Against the Verbose Style

Code is read much more than it is written. The more verbose style requires more reading to understand what the programmer wants to say. Extraneous code junk slows the reader down without necessarily making the code any clearer. In short, verbose does not equal readable.

Verbose code is a liability for the experienced programmer on your team. She will need to wade through all of this extra, explicit text explaining things she already knows every time she reads the code, during the whole time she works on it. While this may make code easier to understand for the new programmer, the experienced programmer is penalized each time she touches the code.

In general, your experienced programmers are more productive than your junior programmers. A decrease in the productivity of your experienced programmers probably cost you more than any gains you get from making the junior programmer a few percent more effective.

Verbose advocates often argue for the same style throughout the code. By having everything the same style throughout, there is no indication in the code of when things are getting more complex and that you might need to take care.

Including extra, optional syntax for the sake of clarity might help an inexperienced programmer who is new to the language. It just gets in the way of the more experienced developer. It's more to read and mentally parse, without necessarily making things clearer.

Conclusion?

So, which is the correct style?

This post is already way too long. In the next installment, I'll talk about some trade-offs and myths regarding these two styles.

Posted by GWade at 11:15 PM. Email comments

July 04, 2013

Sturgeon's Revelation of Code

One problem with comparing different programming languages is finding a valid comparison. It's pretty hard to really compare two different languages. On the other hand, it's pretty easy to compare example code from the two languages. Unfortunately, comparing code examples is likely to run into Sturgeon's Revelation, or Sturgeon's Law:

90% of everything is crap.

Comparing Languages

From Sturgeon's Revelation, it is reasonable to assume that 90% of the code written in any language is crap. If you want to make your language look good, compare an example of the 10% of non-crap code from your language with an example of the crap code from another language. This sounds like a pretty underhanded way to argue, since you are explicitly not comparing equivalent examples. I suspect that the reality is less malicious.

Let's say you want to compare a language you don't know to a language you know well. You will probably spend some time finding or writing a good example in your language to show its advantages. Since you don't know the other language, you can only pick a random piece of code to compare against. From Sturgeon's Revelation we know that 90% of the code you are likely to pick is crap. So, you end up comparing really good code with crap. Thanks to confirmation bias, this happens to match up with your opinion anyway, so it must be right.

Conclusion

This tendency to compare code seems to be much more common for people who only know one or two programming languages. Someone who knows two languages well enough to choose good examples from both languages will be less likely to be biased toward one or the other. In my experience, the more languages you know, the more likely you are to see the commonalities rather than the differences.

This is not guaranteed. I have worked with languages that I have gotten competent with that I really don't like. But, on the whole, it's hard to take a hard line against a language that you really understand.

The next time you see someone using examples of code to prove which language is best, watch out. Sturgeon may be hiding in the crowd, laughing.

Posted by GWade at 10:22 PM. Email comments