This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.
In LCDC: Fundamental Knowledge, I explained how hard it is to specify a minimum level of knowledge or experience for all programmers. This minimum level would be needed to determine what is allowable for Lowest Common Denominator Code (LCDC). Anyone who has been programming for any time is probably shouting at the screen, calling me an idiot, because programmers don't really need to know the internals of some of this stuff. We can rely on well-written libraries to handle the hard parts.
I'm going to look at this from two different directions.
The problem with assuming that a library hides all of the hard bits is that no library is a perfect abstraction. In some cases, you can ignore the internals. In others, the fundamental properties of the library are more evident.
In recent years, I've been doing quite a bit of Perl programming. In Perl, as in most of the dynamic languages, one of the fundamental data types is a hash, which is implemented as a hash table. To make sure we are on the same page (because I can't know your background), the following is a list of the important characteristics of a Perl hash.
I have regularly seen a pattern in code where a programmer wants to see if a string exists in a large array of strings. So, they use the following approach:
They reason that looking up a string in a hash is fast, so this is a good idea. Unfortunately, this is actually slower than doing a straight-forward linear search of the array. If the programmer understood the way hashes worked (and a little bit about algorithmic complexity), they never would have made this mistake.
In multiple languages, I've seen people use the standard library's sort
function by calling the standard rand
function for the sorting function to try to randomize an array. Without knowing how sort works under the hood, you may not realize that this can result in anything from a mostly unsorted array to a run that doesn't terminate. (In really unusual cases, it could result in modifying memory outside the array.
A large number of security holes have been caused by misuse of the C standard library functions strcat
and strcpy
. Some people blame the language for not being robust. Another way to look at it is that people are using the library without understanding how it works.
One last example dates from early in my programming career. I found the following line in a C program.
str[strlen(str)] = '\0';
In fact, this same idiom was repeated in many places in the code. It turns out that the programmer had come to C from another language. When learning C, he had read that every C string must be terminated with a nul
character. He intended this to set the character after the end of the string to nul
. Unfortunately, he didn't realize that strlen
works by looking for the nul
. This makes the line an expensive no-op.
The more complex the library, the more likely that some programmer will not understand it. This means that hiding complicated code by putting it in libraries may not solve your problem.
Let's say that somehow we could argue that the library solution would actually make complicated algorithms and data structures usable for everyone. Shouldn't that same argument apply to your project's code? Shouldn't your programmers be able to write a set of code to wrap up complicated logic and make it usable to the entry level people?
If the library is well designed, with good abstractions, and documented very well, they can definitely abstract away some of the complex problems in the code. This approach makes the code easier to understand and maintain for junior programmers.
The problem, of course, is that you can't use a library to encapsulate knowledge and still write the internals of the library without the need for that knowledge. In general, the critical functionality of the code is usually entrusted to the more senior people. They must understand the internals, in order to write the library code. So, at a minimum, the library itself cannot be LCDC.
Libraries are not a panacea for the LCDC problem. Programmers can find ways to misuse libraries if they don't understand the algorithms and assumptions used by the library. Moreover, if libraries could solve the problem, then your project should be able to use the same approach by hiding knowledge in libraries. But, that violates the LCDC assumption because the library cannot be written without that knowledge.
In the next post, we'll start looking at a way to get rid of the LCDC assumption.
For the rest of the posts in this series, check out The Myth of Code Anyone Can Read.
In The Myth of Code Anyone Can Read, I introduced the idea that least common denominator code (LCDC) is not a good approach to writing software. One reason for this problem is caused by the knowledge base of your average programmer.
Programming is still a relatively new field. It's also a pretty broad field. A person claiming to be a programmer or software engineer could have learned their craft in any of several ways:
Each of these can result in either really good or not-so-good programming skills. In addition, the terms programming and software development can also be applied in very different areas.
Each of these different areas have very different ideas of what knowledge and skills are fundamental. You can't necessarily take a website developer and have them be productive on an embedded systems project. You might not want a game developer working on software for pacemakers.
Given different backgrounds, specifying a minimum level of knowledge becomes much harder.
Let's start simple. If we want to write LCDC, we can't use any data structures that aren't understood by everyone. So, we can probably guess that most people would understand arrays. That is pretty fundamental. What about others[1]:
Most programmers of my experience are not familiar with many of the data structures above, much less all of them. Some of these data structures underlie programming tools we use every day. Others are more specialized. Some are extremely well-known in one industry or company and virtually unknown in others.
If we really want LCDC, these data structures and the advantages they give would be unavailable. After all, most programmers don't know how a red-black tree or hash table work, so how can we write code that uses them?
Data structures aren't the only fundamentals that we can't rely on everyone understanding. Many of the algorithms that we depend on are opaque to the average developer.[2]
In some fields, each of these algorithms are commonly used. In others, each is completely unknown. Even in the fields that a particular algorithm is used, most developers probably don't understand all of algorithms used in that field. According the LCDC premise, we can not use any algorithms that everyone can't understand.
Because of the breadth of the programming field and the many different ways that individuals came to work in the field, it is very hard to describe a subset of knowledge that we can claim is known by everyone.
Not all of these apply to every business, but most programs end up touching one or more of these areas somewhere. Our code would be slower, less correct, and harder to maintain without being able to take advantage of well-known and well-tested algorithms, even if they are beyond the grasp of your most junior people.
In the next post, I'll explore libraries to solve this problem. We'll also see how they would be impacted by the LCDC idea.
I got into a conversation recently coming out of the Houston.pm user group meeting. As usual, we wandered over numerous technical topics, but one stuck out in my mind: whether or not to use more advanced or more complicated language idioms.
I've written about programming idioms and advanced code many times in the past (see below). Part of the reason for revisiting this topic repeatedly is a mindset that I have seen throughout my career. The idea is to write the code so that anyone can read it. Although this sounds reasonable at first, lowest common denominator code (LCDC) almost always results in a hard-to-maintain code base.
There are a number of reasons for this simple idea to fall apart. The most obvious comes from your experience of reading text in a human language. If we wanted to keep the text at a level that anyone could read, everything would need to be written at a first grade level. That's really the lowest level that you can claim that someone can read.
In human languages, text is written at different levels depending on the context and expected audience. Why would you expect programming to be different? Over the next few entries, I plan to cover some of the context that would change the way code should be written.
Over the course of the next few entries, I plan to show different places where writing lowest common denominator code (LCDC) would harm the project and, possibly, your business.