Programmer Musings: August 2006 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

August 31, 2006

Programming for Other Programmers

One of the odd things about programming is that you are writing for two different audiences. One is the dumbest device ever invented (it does exactly what you tell it to do) and the other is a (hopefully) intelligent programmer who will need to read this code later. We spend a fair amount of time figuring out how to make the computer understand what we write. I'm not so sure that we spend as much time on the other part. In fact, the only advice I have ever seen for writing for the next programmer is the admonishment not to write advanced code because later programmers won't understand it.

I think it is time to challenge that advice. I'm not going to advise that we don't take the other programmer into account when writing code. (After all, you may be that programmer someday.) I challenge the fact that we should write code to the lowest common denominator. Code should be written to a level appropriate to the job at hand. For example, third party library code is often written using more advanced idioms. This often results in more power per line of code. Since it will not be modified by novice programmers, there is no need to make it overly simple. Code that is expected to be used and maintained by entry-level programmers should be written using much simpler idioms.

If we don't allow more advanced idioms in a code base, the code never progresses beyond what would be written be an entry-level programmer. This will make retaining programmers who want to improve their skills harder. No one wants to spend their entire career maintaining entry-level code. If you are forced only to use entry-level idioms, that also makes it harder to keep up with advances in the the state of the art. After all, what's the point of learning cool new approaches, if you will never get a chance to use them.

Just about every advanced technique I've ever seen has been avoided by people because it is too hard (objects, exceptions, templates, operator overloading, STL, table-driven code, patterns, etc.). When the technique is new, that is probably a valid comment. But, as the community develops experience with the technique, we discover both its weak points and strong points. When the strengths outweigh the weaknesses, we should really begin developing our own experience with a technique.

A well-written piece of advanced code can also convey more information in a small space. Some really good examples of this are the algorithms from the C++ standard library. Which would you rather maintain in code:


    if(vec.end() != std::find( vec.begin(), vec.end(), 42 ))
    {
        std::out << "Found" << std::endl;
    }


    for(std::vector::iterator it = vec.begin();
         it != vec.end();
         ++it)
    {
        if(42 == *it)
        {
            std::out << "Found" << std::endl;
            break;
        }
    }


    for(int i = 0;i < vec.size();++i)
    {
        if(42 == vec[i])
        {
            std::out << "Found" << std::endl;
            break;
        }
    }

Although the last is easiest to understand for someone who has never seen the STL algorithms, You still need to read quite a bit of code to figure out what it is doing. The condition is buried inside the loop. There are also several places where a coding error is possible. Accessing the element is slower (because we have to index and dereference each time). The second drops the speed penalty for indexing, but is even more complex to read. Once you get used to the first approach, it is smaller, faster, and harder to get wrong.

Should we avoid the first approach because an entry-level programmer might not understand it?

Some people will think this is a silly example. After all, the first is officially supported by the C++ standard and STL algorithms are no longer considered advanced by most C++ programmers. But I still remember when this was considered beyond the cutting edge. Following a mandate to not use this approach would result in doubling the size of this code and requiring more reading for everyone who maintains the code from now on.

Although I am advocating the use of advanced techniques, I am not suggesting that we use them indiscriminately. If the argument that entry-level programmers won't understand it isn't enough to prevent the use of a technique, the argument that it is new or advanced isn't enough to make you use it. Every new technique requires some learning time. A technique or idiom has to prove that it gives enough benefit to overcome this learning cost. The important point that many people seem to forget is that the more of these techniques you learn, the easier it becomes to learn the next one.

Posted by GWade at 10:12 PM. Email comments

August 11, 2006

Review of Software Estimation

Software Estimation
Steve McConnell
Microsoft Press, 2006

Software Estimation is the latest book on the craft of developing software from Steve McConnell. If you have read any of his earlier books (Code Complete, Rapid Development, or Software Project Survival Guide), you might suspect that the book will contain lots of useful information that is well researched and well presented. You won't be disappointed.

Early in chapter 2, McConnell gives an exercise to find out how good an estimator you are. I did not expect to do very well on this, but was surprised how badly I did. Then, I saw the results from other people who took the test. If you think you don't need to read this book, the results of this test will probably convince you otherwise.

McConnell expands on the concept of the Cone of Uncertainty that he described in earlier books. He also distinguishes between the science and art of estimation. Some of the book's best information is the cataloging of various methods of estimating, including information about their effectiveness. I am amused that almost every approach I have seen was cataloged as a bad method.

The methods specified by McConnell are quite explicit and easy to follow. Despite reading information on software estimation in many books over the years, this is the first description that has actually improved my estimating almost immediately. I find myself applying his methods consistently on small tasks, with an eye towards improving my accuracy for larger estimating tasks.

The techniques are explained very well, with both strengths and limitations clearly defined. He also gives much more convincing explanations of why certain techniques work better with larger projects and longer time frames, while others work with small tasks.

One area that he devotes a fair amount of time to is the problem of getting estimates accepted. McConnell clearly defines three concepts that are often confused when talking about estimates: estimate, target, and commitment. The book makes a strong argument that much of the problem we have concerning estimates really boils down to a misunderstanding of which of these is asked for or which is being presented. The first chapter is devoted to explaining this problem and defining terms. Chapters 22 and 23 describe techniques for getting estimates accepted and for answering the real questions, which usually have more to do with commitments and targets.

If you are working on software, you need to read this book. I suspect it would also be somewhat useful to people in some related fields. This is also one of those rare software books that I would recommend to almost any programmer. A really junior programmer could benefit from learning these skills early. Intermediate and senior programmers will also find suggestions for improving their skills.

Posted by GWade at 09:32 PM. Email comments

August 03, 2006

Misunderstanding XML

Like many developers, I've been working with XML for many years now. My first XML-based program dates back to within 6 months of the publishing of the XML 1.0 Recommendation by W3C. I've already gone through the phases of:

XML looks promising,
XML is cool,
everything in XML, and
XML is just another tool

XML is widely enough used that most people doing active development should have some concept of the ground rules. I realize that some people have not worked with XML as extensively as others, but there are a few things that have really begun to get on my nerves. I apologize in advance for what will mostly be a rant.

XML is powerful. There are a large number of tools that work with XML. But, not every text file with angle brackets is XML. If you say you are working in XML, you have to follow the rules specified in the recommendation or it's not XML.

Well-formed XML

If the data is not well-formed, it is not XML. There is no such thing as lenient XML. If you call it XML, except we have a few extensions..., its not XML. If you have start tags with no end tags, it's not XML. If you have more than one root element, it's not XML. If you have attribute values that are not surrounded by quotes, it's not XML. This does not mean that the format may not be useful. But, if your data is not XML, don't expect an XML tool to handle it. This makes as much sense as expecting your MP3 player to read an MS Word document aloud.

Over and over again through the years, someone will report that one XML tool or another is broken because it can't read their XML. Most of the time, the document turns out to be ill-formed. Sometimes, the questioner will say "thanks" and fix their XML. Most of the time, they begin suggesting that the tool be modified just a little to handle this case. Or they may complain the the authors of the tool should have made it capable of dealing with this case, since it is obviously useful.

These people are missing the point. They may have a useful format, but it is not XML. No XML tools should be expected to deal with them, any more than you should expect an XML tool to process a jpeg image. If you have questions about the definition of well-formed XML, there are many sources available. The definitive description, of course, is at Extensible Markup Language (XML) 1.0.

Valid XML

If an XML application is described by a schema of some type, (DTD, XML Schema, RELAX NG, etc.) only documents matching that schema are valid. This is the definition of a valid XML document. If there is no schema, the document is neither valid nor invalid. However, if there is a defined schema for a particular XML application, documents that don't match the schema are invalid.

If your document does not contain required elements or attributes, it is invalid. If your document contains extra elements or attributes (unless they are in a different namespace), your document is invalid. Even elements and attributes from a different namespace are only valid if the definition of the XML application supports it. Your document is not almost valid or mostly valid. It is invalid.

I work a bit with SVG. This XML application is specified by schemas in several formats and processors are supposed to ignore elements and attributes from unknown namespaces. This makes SVG extremely flexible for many applications.

Unfortunately, people regularly complain on the SVG mailing lists that one viewer or another fails on their SVG when it reaches some non-standard element they have defined. How do these people expect that the viewer will render their <chair/> element or whatever it happens to be? (Usually they want the element to be ignored.) If you have non-standard elements in the SVG namespace (often the default namespace in SVG images), the document is not valid SVG. As such, a viewer should error out. The document is not SVG, but is pretending to be. The viewer cannot tell the difference between an invalid <chair/> element that you want it to ignore and a rectangle that you misspelled <Rect/>. Surely, you would want the viewer to report the second case so that you can correct the error.

Conclusion

One purpose of XML is to provide a standardized, yet flexible, file format that can be processed in a standard way. Many tools have grown around the XML recommendation that allow consistent processing, independent of platform or programming language. These are major benefits. Those who have not spent time troubleshooting data in unusual, proprietary, binary formats may not fully appreciate the benefits of XML. With those benefits come a few restrictions.

XML must be well-formed.
If you are using a standardized XML application, your XML document must be valid.

These are not really horrible restrictions when you consider the benefits.

Posted by GWade at 07:12 AM. Email comments