Programmer Musings: LCDC: Fundamental Knowledge

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

June 19, 2015

LCDC: Fundamental Knowledge

In The Myth of Code Anyone Can Read, I introduced the idea that least common denominator code (LCDC) is not a good approach to writing software. One reason for this problem is caused by the knowledge base of your average programmer.

Different Programmers Have Different Backgrounds

Programming is still a relatively new field. It's also a pretty broad field. A person claiming to be a programmer or software engineer could have learned their craft in any of several ways:

Self-taught: on-line tutorials, books,ongoing self study
Computer science degree
Management Information System degree
Programming course in a different degree program
Programming boot camp
Internship at a programming shop

Each of these can result in either really good or not-so-good programming skills. In addition, the terms programming and software development can also be applied in very different areas.

Embedded systems
Hardware driver development
Scientific software
Website development
SCADA software
Financial software
Game development
Graphics programming
Smart phone app development
Automotive software
High availability software
... and many more

Each of these different areas have very different ideas of what knowledge and skills are fundamental. You can't necessarily take a website developer and have them be productive on an embedded systems project. You might not want a game developer working on software for pacemakers.

Given different backgrounds, specifying a minimum level of knowledge becomes much harder.

Data Structures

Let's start simple. If we want to write LCDC, we can't use any data structures that aren't understood by everyone. So, we can probably guess that most people would understand arrays. That is pretty fundamental. What about others[1]:

Linked lists
Binary trees: basic binary, AVL, or red-black trees
Generalized trees: tries, suffix trees, octrees, B-trees, R-trees
Graphs: DAGs, spanning trees
Stacks
Queues: FIFO, dequeues, priority queues
Hash tables, associative arrays, dictionaries
Heaps

Most programmers of my experience are not familiar with many of the data structures above, much less all of them. Some of these data structures underlie programming tools we use every day. Others are more specialized. Some are extremely well-known in one industry or company and virtually unknown in others.

If we really want LCDC, these data structures and the advantages they give would be unavailable. After all, most programmers don't know how a red-black tree or hash table work, so how can we write code that uses them?

Fundamental Algorithms

Data structures aren't the only fundamentals that we can't rely on everyone understanding. Many of the algorithms that we depend on are opaque to the average developer.[2]

Sorting: quicksort, insertion sort, heap sort, merge sort
Security: SHA-256, AES, Diffie-Hellman key exchange, cypher-block chaining, HMAC
Graphics: JPEG compression, ray tracing, bezier curves
Databases: SQL, document databases, object databases, hierarchical databases
Randomness: Fisher-Yates shuffle, Mersenne Twister, entropy pools
String manipulation: regular expressions, longest common sub-sequence, hamming distance, Levenshtein distance, KMP algorithm
Graphs: Dijkstra's algorithm, alpha-beta pruning, topological sort

In some fields, each of these algorithms are commonly used. In others, each is completely unknown. Even in the fields that a particular algorithm is used, most developers probably don't understand all of algorithms used in that field. According the LCDC premise, we can not use any algorithms that everyone can't understand.

Summary

Because of the breadth of the programming field and the many different ways that individuals came to work in the field, it is very hard to describe a subset of knowledge that we can claim is known by everyone.

Not all of these apply to every business, but most programs end up touching one or more of these areas somewhere. Our code would be slower, less correct, and harder to maintain without being able to take advantage of well-known and well-tested algorithms, even if they are beyond the grasp of your most junior people.

In the next post, I'll explore libraries to solve this problem. We'll also see how they would be impacted by the LCDC idea.

Notes

Apologies if I've left out your favorite data structure. I just wanted a list big enough to get the point across.
Since there are even more algorithms than data structures. This is an even more incomplete list. On the other hand, I suspect that more of these will be unknown to more programmers.

Posted by GWade at June 19, 2015 08:25 AM. Email comments