This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.
This is the second entry in the Best Practices Gone Bad series of posts.
One principle that a lot of developers have been holding to lately is DRY. In The Pragmatic Programmer, Hunt and Thomas state the DRY principle as:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
This principle was stated as an attempt to push back against a tendency for people to duplicate information and processes either on purpose or accidentally.
I originally ran across this concept several years earlier in P.J. Plauger's book Programming on Purpose: Essays on Software Design. Plauger used the phrase the one, right place to describe what Hunt and Thomas call DRY.
It seems that the point where DRY normally goes bad is when the developer forgets that there are two important points in the DRY definition.
Often less experienced developers make DRY mistakes by focusing on the second point and forgetting the first. These developers become zealots working to remove any piece of duplicated code in the program.
This leads to one of the most common misuses of DRY that I've seen by trying to remove duplication of implementation that is not really duplication of knowledge. For example, if we have the same literal value in two places in the code but they have different meanings, they are not a single piece of knowledge. They just happen to have the same value. Likewise, two subroutines that have superficially similar structure, but implement different concepts also should not be combined to make one representation.
Let's assume that we have two constants for telling which way to display time: TWELVE_HOUR_TIME
(12) and TWENTY_FOUR_HOUR_TIME
(24). It would not make sense to use TWELVE_HOUR_TIME
when defining an array of month names, despite the fact that the literal 12
is in two places, it is not duplication of a single piece of knowledge.
I've seen people trying to avoid the obvious problem of using TWELVE_HOUR_TIME
in a place it doesn't make sense, by defining a constant TWELVE
and using it in both places. As a result, we now have a generic name that tells us no more than the literal did and we've still coupled together two concepts that should be separate. In a very important sense, this wouldn't reduce duplication of knowledge.
In one instance I recall, a somewhat junior programmer decided that two methods weren't DRY enough because they both contained a loop over the same internal array. His solution was to create one method containing the loop and both sets of processing code. His method expected a flag passed in to control an if
-else
construct inside the loop to determine which operation to perform.
This increased the overhead for each loop iteration. It also made the overall code much harder to understand. In addition, any call to this method was now completely obtuse because its behavior depended on a flag which couldn't really tell the next maintainer anything.
Yes, the new version reduced duplication of implementation. But, the duplication it had reduced was not worth the maintenance nightmare that resulted. More importantly, the supposed duplication was not conceptual. No knowledge duplication was involved. The upshot of this is that if one of the concepts ever needs a different implementation, we will either add more conditional logic, making the subroutine harder to understand, or we'll need to break them apart once again.
If the data structure to be traversed had been more complicated than a simple array, I could understand an apply
or visit
method that took a code reference to apply to each element in the data structure. This would have actually removed some conceptual duplication for a complicated data structure, but for an array this is overkill.
Another example that seems to be common among people new to object oriented programming is the desire to create a base class to promote code reuse. This normally results in a base class with a generic, unhelpful name and multiple unrelated subclasses.
This is may be caused by someone recognizing what the AOP aficionados call a cross-cutting concern. For example, we need to support logging, so every class in the system derives from a Loggable
class.
This has been solved by languages through the mechanism of Roles.
We all know that duplication in code is a bad thing. So, the DRY principle seems like it should be applied universally. Unfortunately, even DRY requires some thought when it is applied. The original DRY principle referred to a piece of knowledge as the unit of duplication we want to reduce. Most bad examples of DRY seem to be caused by ignoring the concepts and focusing on implementation duplication.
As a first post in the Best Practices Gone Bad series, I figured I would take an simple example from a previous job.
In that position, there were two best practices that intersected to generate unreadable code. The first practice was expressive names. All names of variables, classes, and methods were required to be long enough to be clear and readable. In any moderately complex system, this will result in pretty long names.
The second practice was to limit all lines to 80 characters. Research into both software and web page design has shown that longer lines are harder to read. Many programmers have settled on 80 characters as the perfect line length, despite the fact that we no longer program on punch cards.
As this was a Java project, method chaining was a pretty common practice. This resulted in code that looks like:
SpecialResultClass resultOfInterestingAction
= businessObjectCollection.getItemByIndex(
theIndexOfTheObjectOfInterest
).performActionReturningInternalObject()
.performInterestingAction();
Obviously, these are not the actual method and class names, but the length is consistent with actual code. The result is a relatively simple action has extended to five lines of code. This is not in any way clearer than code used shorter names and/or a slightly longer line length.
Result result = businessObjs.get( i ).getFoo().act();
Obviously, this is only clear if you know what the objects are. But, that observation also applies to the longer case.
So, how did two best practices end up with code that was mostly unreadable? The problem is that the two best practices take opposite approaches to the readability problem. Since there is no one, true solution to any problem, a compromise was needed between these two approaches. Unfortunately, the people making the rules refused to compromise. The result was code that was painful to read.
This is a really good example of how best practices go bad. The intent was to apply best practices to improve the software. Unfortunately, the people making the rules did not foresee how these practices would interact. The result was that even simple methods became long because almost every statement took up 3-10 lines.
Most professional fields develop a set of best practices that help people in the field produce consistent solutions to similar problems. Any given problem has an infinite number of potential solutions, some of those will actually work. Fewer still will work well.
Best practices tend to prune some of the bad solutions by pointing out approaches that have been shown to work consistently. Software development has been around long enough for a number of best practices to become available.
If someone applies a practice without understanding it, there is the possibility that they will misuse that practice. This post begins an intermittent series that will touch on best practices that I've seen misused.
Many of the best practice problems I will describe fall victim to similar problems. As time goes on, you should recognize some themes.
Hopefully, this series will be interesting, thought-provoking, and amusing (in a train-wreck sort of way). The posts on this subject can be found at Best Practices Gone Bad Series.
In the last post, we had begun streamlining novice Ned's problem. Our current solution has some problems. The most annoying is the need to type a password for each server that we are going to write to.
We are using scp
to do the copy, so the login is controlled by the SSH system. SSH supports using public keys to log in to a system. Ned will need to check with the appropriate people to be sure there is no policy preventing using public keys for access. This process requires several steps.
authorized_keys
file.Before accessing the remote machines, you will need to execute ssh-agent
and add your key to the agent. Now, every attempt to access a remote host will attempt to use your key to log you in.
You generate keys for SSH using the program ssh-keygen
. You may want to check with your sysadmins or security group to learn if there is any company standard for keys, otherwise just running the command will pick reasonable defaults. Make certain to provide a good passphrase when asked. Anyone with this passphrase can use your key to impersonate you on any system that you can log into.
The ssh-keygen
program should generate a pair of files in the .ssh
directory under your home directory. One of those files will have a .pub
extension. That is the public key. The same filename without the extension is your private key. You should make certain that the private key stays on your machine and that you don't give it to anyone.
You should make certain that the directory and any files in it are only readable and writable by you.
Depending on the how your current machine is set up, ssh-agent
may or may not already be running.
If it's already running, you can skip to the next step. If you need to run ssh-agent, you still need to do a bit more. The ssh-agent program sets some environment variables needed to do its job. The easiest way to get it set up
is to execute:
eval `ssh-agent`
This adds the correct variables to the current shell instance.
You tell ssh-agent about the keys you want to manage with the ssh-add
command. If you kept the default file name in _Creating a Key_ above, you can just execute ssh-add
. If you changed the name, you will need to execute:
ssh-add ~/.ssh/{private_key_name}
The ssh-add program will request your passphrase at this point to add the key to the agent. Afterwards (until the next time ssh-agent is restarted), ssh will get the keys from ssh-agent without asking for your passphrase.
In order to use the public key login, you need to add your public key to the .ssh/authorized_keys
file on the remote machine. The simplest approach for this is
ssh-copy-id -i ~/.ssh/{private_key_name} user@remote_machine
If your system does not have ssh-copy-id
script, you will need to modify the authorized_keys file on the remote server. Instructions for that are available in many places on-line.
It's probably a good idea to make certain that the .ssh directory and the authorized_keys file on the remote server are only readable and writable by the remote account.
Attempt to SSH into the remote server and you'll find that no password or passphrase is needed.
Modify the authorized_keys as described above for each of the remote machines you need to access. Although it seems like a lot of work, this will be the last time you need to make this change for each of the remote servers.
If a new server is added to your responsibility, add you key to the authorized_keys file for the new server and you are ready to go.
When you are finished, you can run the script from the last post and all of the scp commands automatically log in to the remote servers.
There is more that could be done to improve the script from before, but that is left as an exercise to the reader (for now).