Programmer Musings: LCDC: Library Code

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

June 24, 2015

LCDC: Library Code

In LCDC: Fundamental Knowledge, I explained how hard it is to specify a minimum level of knowledge or experience for all programmers. This minimum level would be needed to determine what is allowable for Lowest Common Denominator Code (LCDC). Anyone who has been programming for any time is probably shouting at the screen, calling me an idiot, because programmers don't really need to know the internals of some of this stuff. We can rely on well-written libraries to handle the hard parts.

I'm going to look at this from two different directions.

Libraries Without Understanding

The problem with assuming that a library hides all of the hard bits is that no library is a perfect abstraction. In some cases, you can ignore the internals. In others, the fundamental properties of the library are more evident.

Misuse of Hashes

In recent years, I've been doing quite a bit of Perl programming. In Perl, as in most of the dynamic languages, one of the fundamental data types is a hash, which is implemented as a hash table. To make sure we are on the same page (because I can't know your background), the following is a list of the important characteristics of a Perl hash.

Consists of strings as keys with and associated scalar which is the value
Given a string, access to the associated value does not depend on the size of the hash (constant time access)
Checking for the existence of a key in the hash is also a constant time operation.

I have regularly seen a pattern in code where a programmer wants to see if a string exists in a large array of strings. So, they use the following approach:

build a hash from the array of strings
check for existence of the string in the hash
discard the hash

They reason that looking up a string in a hash is fast, so this is a good idea. Unfortunately, this is actually slower than doing a straight-forward linear search of the array. If the programmer understood the way hashes worked (and a little bit about algorithmic complexity), they never would have made this mistake.

Random Sorting

In multiple languages, I've seen people use the standard library's sort function by calling the standard rand function for the sorting function to try to randomize an array. Without knowing how sort works under the hood, you may not realize that this can result in anything from a mostly unsorted array to a run that doesn't terminate. (In really unusual cases, it could result in modifying memory outside the array.

C String Functions

A large number of security holes have been caused by misuse of the C standard library functions strcat and strcpy. Some people blame the language for not being robust. Another way to look at it is that people are using the library without understanding how it works.

Terminated C Strings

One last example dates from early in my programming career. I found the following line in a C program.


     str[strlen(str)] = '\0';

In fact, this same idiom was repeated in many places in the code. It turns out that the programmer had come to C from another language. When learning C, he had read that every C string must be terminated with a nul character. He intended this to set the character after the end of the string to nul. Unfortunately, he didn't realize that strlen works by looking for the nul. This makes the line an expensive no-op.

The more complex the library, the more likely that some programmer will not understand it. This means that hiding complicated code by putting it in libraries may not solve your problem.

Project Libraries

Let's say that somehow we could argue that the library solution would actually make complicated algorithms and data structures usable for everyone. Shouldn't that same argument apply to your project's code? Shouldn't your programmers be able to write a set of code to wrap up complicated logic and make it usable to the entry level people?

If the library is well designed, with good abstractions, and documented very well, they can definitely abstract away some of the complex problems in the code. This approach makes the code easier to understand and maintain for junior programmers.

The problem, of course, is that you can't use a library to encapsulate knowledge and still write the internals of the library without the need for that knowledge. In general, the critical functionality of the code is usually entrusted to the more senior people. They must understand the internals, in order to write the library code. So, at a minimum, the library itself cannot be LCDC.

Summary

Libraries are not a panacea for the LCDC problem. Programmers can find ways to misuse libraries if they don't understand the algorithms and assumptions used by the library. Moreover, if libraries could solve the problem, then your project should be able to use the same approach by hiding knowledge in libraries. But, that violates the LCDC assumption because the library cannot be written without that knowledge.

In the next post, we'll start looking at a way to get rid of the LCDC assumption.

For the rest of the posts in this series, check out The Myth of Code Anyone Can Read.

Posted by GWade at June 24, 2015 08:25 AM. Email comments