Programmer Musings: The Importance of the Correct Notation

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

June 25, 2006

The Importance of the Correct Notation

Several of the past few essays have focused on the notation aspects of programming languages. The major benefit of a notation is making what you want to say concise without losing it's power in the process. To use an example from an earlier essay, supporting the use of normal algebraic notation for working with matrices allows the programmer to work at a high level, without spending too much time on the details of the operations. Unfortunately, this conciseness of expression is both an advantage and a disadvantage of any particular notation. If the reader is not familiar with the notation, they can easily be confused by what is happening. As I said earlier, a powerful notation is useful for the expert, but harder for the novice.

Another interesting issue with notations is that the best notations are limited in scope. Trying to use a notation outside its scope leads to more confusion. A good bad example for me is the use of + by some languages to concatenate two strings. This notation seems relatively intuitive for most programmers and works rather well for many languages. Unfortunately, in some scripting languages, this notation causes a problem. I first ran into this in ECMAscript. When adding a number to a string that contains a number, what should be the outcome?
var str = "10"; alert( str + 0 );
Should the output be "100" or "10"? Since the + notation is used in two overlapping contexts, you can see a real cause for confusion. This problem would have been solved by using a different notation for concatenation. Then, this notation would interpret the string as the number 10 and add 0 to it.

Obviously, using the same notation for adding two integers and two whole numbers simplifies the use of the language. But, often using a different notation for different concepts makes the language clearer. By the same token, having different sets of notations for operations that are different is very handy. If the operation isn't in some way the same the notation should look different.

In the early '90s, C++ was becoming popular in the PC world. Borland C++ was a popular compiler in this market. Borland produced a set of container classes that used many features of C++ that were considered advanced at the time. Unfortunately, like most of us, the compiler writers did not have much experience with these features and they made some really bad notational mistakes. One of them was to use operator overloading to add items to their containers.

This turned out to be confusing because not everyone agreed what was meant by adding an item to the container. Did it add at the front, the back, or somewhere else? When using + to add to the container, the operator modified it's left hand argument. This is not how + works for integers. In a later version of the compiler, Borland did a very brave thing by completely changing the interface based on these experiences. They realized that they were using the wrong notation, and broke a significant amount of code by changing the interface to one that was more effective.

This is a good example of a bad notation making a feature harder to use than necessary. (I know some people will suggest that this proves operator overloading is a bad thing. I'll come back to that subject in another essay.) By changing the notation to be more precise, Borland made the next version of their containers easier to use correctly.

The important point here is that notations are like interfaces. Good ones make saying what you want easier and more correct. Bad ones are easy to misuse. More importantly, in both interfaces and notations, what is good or bad is determined partially by the context in which it will be used. In the + example above, the problem with using + for string concatenation was that ECMAscript also converts strings to numbers and back automatically. This generates an ambiguity in the notation. This ambiguity is caused by the context of the operation. In the containers example, replacing + with insert and append (I don't remember the exact method names, but they were something like this), allowed more precision at the cost of somewhat more verbosity.

The important point is to use the right notation for the job.

Posted by GWade at June 25, 2006 05:03 PM. Email comments