This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

September 26, 2014

BPGB: Unnecessarily DRY

This is the second entry in the Best Practices Gone Bad series of posts.

One principle that a lot of developers have been holding to lately is DRY. In The Pragmatic Programmer, Hunt and Thomas state the DRY principle as:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

This principle was stated as an attempt to push back against a tendency for people to duplicate information and processes either on purpose or accidentally.

I originally ran across this concept several years earlier in P.J. Plauger's book Programming on Purpose: Essays on Software Design. Plauger used the phrase the one, right place to describe what Hunt and Thomas call DRY.

DRY Gone Bad

It seems that the point where DRY normally goes bad is when the developer forgets that there are two important points in the DRY definition.

  1. piece of knowledge
  2. single ... representation

Often less experienced developers make DRY mistakes by focusing on the second point and forgetting the first. These developers become zealots working to remove any piece of duplicated code in the program.

This leads to one of the most common misuses of DRY that I've seen by trying to remove duplication of implementation that is not really duplication of knowledge. For example, if we have the same literal value in two places in the code but they have different meanings, they are not a single piece of knowledge. They just happen to have the same value. Likewise, two subroutines that have superficially similar structure, but implement different concepts also should not be combined to make one representation.

Overly DRY Constants

Let's assume that we have two constants for telling which way to display time: TWELVE_HOUR_TIME (12) and TWENTY_FOUR_HOUR_TIME (24). It would not make sense to use TWELVE_HOUR_TIME when defining an array of month names, despite the fact that the literal 12 is in two places, it is not duplication of a single piece of knowledge.

I've seen people trying to avoid the obvious problem of using TWELVE_HOUR_TIME in a place it doesn't make sense, by defining a constant TWELVE and using it in both places. As a result, we now have a generic name that tells us no more than the literal did and we've still coupled together two concepts that should be separate. In a very important sense, this wouldn't reduce duplication of knowledge.

Overly DRY Code

In one instance I recall, a somewhat junior programmer decided that two methods weren't DRY enough because they both contained a loop over the same internal array. His solution was to create one method containing the loop and both sets of processing code. His method expected a flag passed in to control an if-else construct inside the loop to determine which operation to perform.

This increased the overhead for each loop iteration. It also made the overall code much harder to understand. In addition, any call to this method was now completely obtuse because its behavior depended on a flag which couldn't really tell the next maintainer anything.

Yes, the new version reduced duplication of implementation. But, the duplication it had reduced was not worth the maintenance nightmare that resulted. More importantly, the supposed duplication was not conceptual. No knowledge duplication was involved. The upshot of this is that if one of the concepts ever needs a different implementation, we will either add more conditional logic, making the subroutine harder to understand, or we'll need to break them apart once again.

If the data structure to be traversed had been more complicated than a simple array, I could understand an apply or visit method that took a code reference to apply to each element in the data structure. This would have actually removed some conceptual duplication for a complicated data structure, but for an array this is overkill.

Base Class Abomination

Another example that seems to be common among people new to object oriented programming is the desire to create a base class to promote code reuse. This normally results in a base class with a generic, unhelpful name and multiple unrelated subclasses.

This is may be caused by someone recognizing what the AOP aficionados call a cross-cutting concern. For example, we need to support logging, so every class in the system derives from a Loggable class.

This has been solved by languages through the mechanism of Roles.

Conclusion

We all know that duplication in code is a bad thing. So, the DRY principle seems like it should be applied universally. Unfortunately, even DRY requires some thought when it is applied. The original DRY principle referred to a piece of knowledge as the unit of duplication we want to reduce. Most bad examples of DRY seem to be caused by ignoring the concepts and focusing on implementation duplication.

Posted by GWade at 07:07 AM. Email comments | Comments (1)

September 22, 2014

BPGB: A Tale of Two Lengths

As a first post in the Best Practices Gone Bad series, I figured I would take an simple example from a previous job.

An Example

In that position, there were two best practices that intersected to generate unreadable code. The first practice was expressive names. All names of variables, classes, and methods were required to be long enough to be clear and readable. In any moderately complex system, this will result in pretty long names.

The second practice was to limit all lines to 80 characters. Research into both software and web page design has shown that longer lines are harder to read. Many programmers have settled on 80 characters as the perfect line length, despite the fact that we no longer program on punch cards.

As this was a Java project, method chaining was a pretty common practice. This resulted in code that looks like:


    SpecialResultClass resultOfInterestingAction
        = businessObjectCollection.getItemByIndex(
            theIndexOfTheObjectOfInterest
          ).performActionReturningInternalObject()
           .performInterestingAction();

Obviously, these are not the actual method and class names, but the length is consistent with actual code. The result is a relatively simple action has extended to five lines of code. This is not in any way clearer than code used shorter names and/or a slightly longer line length.


    Result result = businessObjs.get( i ).getFoo().act();

Obviously, this is only clear if you know what the objects are. But, that observation also applies to the longer case.

The Intersection of Two Lengths

So, how did two best practices end up with code that was mostly unreadable? The problem is that the two best practices take opposite approaches to the readability problem. Since there is no one, true solution to any problem, a compromise was needed between these two approaches. Unfortunately, the people making the rules refused to compromise. The result was code that was painful to read.

Conclusion

This is a really good example of how best practices go bad. The intent was to apply best practices to improve the software. Unfortunately, the people making the rules did not foresee how these practices would interact. The result was that even simple methods became long because almost every statement took up 3-10 lines.

Posted by GWade at 07:48 AM. Email comments | Comments (2)

September 19, 2014

Best Practices Gone Bad

Most professional fields develop a set of best practices that help people in the field produce consistent solutions to similar problems. Any given problem has an infinite number of potential solutions, some of those will actually work. Fewer still will work well.

Best practices tend to prune some of the bad solutions by pointing out approaches that have been shown to work consistently. Software development has been around long enough for a number of best practices to become available.

If someone applies a practice without understanding it, there is the possibility that they will misuse that practice. This post begins an intermittent series that will touch on best practices that I've seen misused.

Themes

Many of the best practice problems I will describe fall victim to similar problems. As time goes on, you should recognize some themes.

Unexpected Interactions
No code exists in a vacuum. Some best practices fail miserably when applied in a code base with certain other practices. Sometimes best practices from two different fields collide. Sometimes the side effects of one practice impedes the use of another.

Unintended Consequences

The Law of Unintended Consequences applies to every action or process people have every devised. Best practices are no different. The only way to avoid the problems is to be aware of any consequences. You should be prepared to revert changes where the consequences are worse than what the practice was designed to prevent.

Overuse of a Practice

Any practice can be overused. Less experienced individuals have a tendency to pick their favorite hammer ... uh, best practice and apply it indiscriminately until it cannot be applied any more.

Fanatical Application

The previous two problems can be combined into the worst case where the best practice becomes more important than any possible consequence. At this point, the coder becomes a fanatic.

Conclusion

Hopefully, this series will be interesting, thought-provoking, and amusing (in a train-wreck sort of way). The posts on this subject can be found at Best Practices Gone Bad Series.

Posted by GWade at 05:40 PM. Email comments

September 18, 2014

Novice Example: Public Key Access

In the last post, we had begun streamlining novice Ned's problem. Our current solution has some problems. The most annoying is the need to type a password for each server that we are going to write to.

High-level Description

We are using scp to do the copy, so the login is controlled by the SSH system. SSH supports using public keys to log in to a system. Ned will need to check with the appropriate people to be sure there is no policy preventing using public keys for access. This process requires several steps.

  • Create a public/private key pair on your local system.
  • Copy the public key to the remote system's authorized_keys file.
  • Verify and correct file permissions.

Before accessing the remote machines, you will need to execute ssh-agent and add your key to the agent. Now, every attempt to access a remote host will attempt to use your key to log you in.

Creating a Key

You generate keys for SSH using the program ssh-keygen. You may want to check with your sysadmins or security group to learn if there is any company standard for keys, otherwise just running the command will pick reasonable defaults. Make certain to provide a good passphrase when asked. Anyone with this passphrase can use your key to impersonate you on any system that you can log into.

The ssh-keygen program should generate a pair of files in the .ssh directory under your home directory. One of those files will have a .pub extension. That is the public key. The same filename without the extension is your private key. You should make certain that the private key stays on your machine and that you don't give it to anyone.

You should make certain that the directory and any files in it are only readable and writable by you.

Using ssh-agent

Depending on the how your current machine is set up, ssh-agent may or may not already be running.

If it's already running, you can skip to the next step. If you need to run ssh-agent, you still need to do a bit more. The ssh-agent program sets some environment variables needed to do its job. The easiest way to get it set up
is to execute:


     eval `ssh-agent`

This adds the correct variables to the current shell instance.

Adding Keys

You tell ssh-agent about the keys you want to manage with the ssh-add command. If you kept the default file name in _Creating a Key_ above, you can just execute ssh-add. If you changed the name, you will need to execute:


    ssh-add ~/.ssh/{private_key_name}

The ssh-add program will request your passphrase at this point to add the key to the agent. Afterwards (until the next time ssh-agent is restarted), ssh will get the keys from ssh-agent without asking for your passphrase.

Authorized Keys

In order to use the public key login, you need to add your public key to the .ssh/authorized_keys file on the remote machine. The simplest approach for this is


    ssh-copy-id -i ~/.ssh/{private_key_name} user@remote_machine

If your system does not have ssh-copy-id script, you will need to modify the authorized_keys file on the remote server. Instructions for that are available in many places on-line.

It's probably a good idea to make certain that the .ssh directory and the authorized_keys file on the remote server are only readable and writable by the remote account.

Attempt to SSH into the remote server and you'll find that no password or passphrase is needed.

Lather, Rinse, Repeat

Modify the authorized_keys as described above for each of the remote machines you need to access. Although it seems like a lot of work, this will be the last time you need to make this change for each of the remote servers.

If a new server is added to your responsibility, add you key to the authorized_keys file for the new server and you are ready to go.

Conclusion

When you are finished, you can run the script from the last post and all of the scp commands automatically log in to the remote servers.

There is more that could be done to improve the script from before, but that is left as an exercise to the reader (for now).

Posted by GWade at 03:25 PM. Email comments