This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

September 26, 2014

BPGB: Unnecessarily DRY

This is the second entry in the Best Practices Gone Bad series of posts.

One principle that a lot of developers have been holding to lately is DRY. In The Pragmatic Programmer, Hunt and Thomas state the DRY principle as:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

This principle was stated as an attempt to push back against a tendency for people to duplicate information and processes either on purpose or accidentally.

I originally ran across this concept several years earlier in P.J. Plauger's book Programming on Purpose: Essays on Software Design. Plauger used the phrase the one, right place to describe what Hunt and Thomas call DRY.

DRY Gone Bad

It seems that the point where DRY normally goes bad is when the developer forgets that there are two important points in the DRY definition.

  1. piece of knowledge
  2. single ... representation

Often less experienced developers make DRY mistakes by focusing on the second point and forgetting the first. These developers become zealots working to remove any piece of duplicated code in the program.

This leads to one of the most common misuses of DRY that I've seen by trying to remove duplication of implementation that is not really duplication of knowledge. For example, if we have the same literal value in two places in the code but they have different meanings, they are not a single piece of knowledge. They just happen to have the same value. Likewise, two subroutines that have superficially similar structure, but implement different concepts also should not be combined to make one representation.

Overly DRY Constants

Let's assume that we have two constants for telling which way to display time: TWELVE_HOUR_TIME (12) and TWENTY_FOUR_HOUR_TIME (24). It would not make sense to use TWELVE_HOUR_TIME when defining an array of month names, despite the fact that the literal 12 is in two places, it is not duplication of a single piece of knowledge.

I've seen people trying to avoid the obvious problem of using TWELVE_HOUR_TIME in a place it doesn't make sense, by defining a constant TWELVE and using it in both places. As a result, we now have a generic name that tells us no more than the literal did and we've still coupled together two concepts that should be separate. In a very important sense, this wouldn't reduce duplication of knowledge.

Overly DRY Code

In one instance I recall, a somewhat junior programmer decided that two methods weren't DRY enough because they both contained a loop over the same internal array. His solution was to create one method containing the loop and both sets of processing code. His method expected a flag passed in to control an if-else construct inside the loop to determine which operation to perform.

This increased the overhead for each loop iteration. It also made the overall code much harder to understand. In addition, any call to this method was now completely obtuse because its behavior depended on a flag which couldn't really tell the next maintainer anything.

Yes, the new version reduced duplication of implementation. But, the duplication it had reduced was not worth the maintenance nightmare that resulted. More importantly, the supposed duplication was not conceptual. No knowledge duplication was involved. The upshot of this is that if one of the concepts ever needs a different implementation, we will either add more conditional logic, making the subroutine harder to understand, or we'll need to break them apart once again.

If the data structure to be traversed had been more complicated than a simple array, I could understand an apply or visit method that took a code reference to apply to each element in the data structure. This would have actually removed some conceptual duplication for a complicated data structure, but for an array this is overkill.

Base Class Abomination

Another example that seems to be common among people new to object oriented programming is the desire to create a base class to promote code reuse. This normally results in a base class with a generic, unhelpful name and multiple unrelated subclasses.

This is may be caused by someone recognizing what the AOP aficionados call a cross-cutting concern. For example, we need to support logging, so every class in the system derives from a Loggable class.

This has been solved by languages through the mechanism of Roles.

Conclusion

We all know that duplication in code is a bad thing. So, the DRY principle seems like it should be applied universally. Unfortunately, even DRY requires some thought when it is applied. The original DRY principle referred to a piece of knowledge as the unit of duplication we want to reduce. Most bad examples of DRY seem to be caused by ignoring the concepts and focusing on implementation duplication.

Posted by GWade at September 26, 2014 07:07 AM. Email comments
Comments

The overly dry code example also violates the software engineering concept of coupling. He now has a function that does two different things depending on a flag parameter controlling the logic: Control Coupling. See also, Logical Cohesion.

Posted by: David McReynolds at November 13, 2015 11:45 AM