Programmer Musings: October 2014 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

October 14, 2014

BPGP: How Best Practices Go Bad

There are a few reasons that Best Practices get specified in a profession.

Provide standards for judging proposed solutions.
Provide cover for unforeseen circumstances.
Provide a starting point for discussions to improve the state of the art even more.
Help more junior people make better decisions before they fully understand the field.

Applied to software, the first becomes part of code reviews. The second is often useful to show due diligence when a system fails or is hacked. The third fuels ongoing arguments that hopefully result in better practices eventually.

The final item impacts this series of posts. There are always more junior people in programming than senior people. By definition, these junior people don't have enough experience to have really good judgement. In order to keep them from making horrible decisions, we introduce best practices to reduce the number of uninformed, inexperienced decisions they will make. As long as you have senior people around, these junior people can follow the best practices and the senior people can slowly mentor them to understand why these practices exist. The experienced hands can also help the juniors to understand the edges of the best practice.

Edge Cases

Every best practice is actually a heuristic or rule of thumb that helps someone make a reasonable decision without having to dig into the details. This allows you to focus on more important issues instead of wasting brain cells on the easy decisions. Unfortunately, every heuristic has edge cases where it falls down. People with more experience tend to recognize when we reach an edge, but juniors may use a best practice, even when it doesn't apply.

As developers gain experience, they see more examples of the cases where a best practice succeeds and fails. This experience improves the developer's understanding of the best practice, but only if they have the experience to recognize the problem. Junior developers can gain this same understanding with the help of more senior people who point out the edges before the juniors feel the full pain of failure.

Unfortunately, if the junior developers do not have more senior people to learn from, they will blunder right through the edge cases and it may be some time before they realize any mistake. Unfortunately, the more inexperienced a someone is, the harder it is for them to realize that they don't know what they need to know. This is called the Dunning-Kruger effect.

Conclusion

So, best practices often go bad when inexperienced programmers use them without oversight. And, one of the reasons that we define best practices is to help inexperienced programmers do a better job. Put these two observations together and we get an nice bit of irony.

The only way I have seen to fix this, is to make certain that the developers at all skill levels are constantly learning, and that they can get mentors that can help them recognize when they don't understand what they think they know.

Posted by GWade at 10:40 AM. Email comments

October 03, 2014

Value Objects

A few years ago, I first became aware of the concept of Value Objects. Unlike most objects you might use in a design, the identity of a value object is completely unimportant. Instead, the value of the object is all that is important. This leads to a couple of interesting design decisions:

Value objects tend to be simple
Value objects tend to be immutable

When we say that a Value Object is simple, we mean that it tends to have a single piece of state and very limited behavior. Often the only behavior is an accessor for the internal state. Making the object immutable means that it gets its value when it is constructed or initialized and the value of the object never changes. Depending on the kinds of Values, the behavior may allow the ability to combine this value with other data to make a new Value Object (think addition for a Money object).

Some programmers would be focused on the fact that we are providing a new class for a thin wrapper around a more primitive data type and would complain that this complicates design for no real benefit. This can easily become the Primitive Obsession Code Smell. The more I use Value objects, the more I believe they provide significant advantages to a design.

Validation

If you talk to people who deal with security for any sizable set of code, you will regularly run across the problem of input validation. Input from the user (or any outside source) is not validated nearly often enough or thoroughly enough. This has lead to numerous security breaches over time. Part of the reason input isn't validated is because the programmer is focused on getting the code to work and not thinking about a malicious user. As the programmer is testing, she normally supplies valid inputs and the code continues to work even without proper validation. A good programmer will also give a few bad examples, to verify that the system rejects them. Very few test deliberately perverse or malicious input.

Part of the problem is that best practices on validation do not help when the program are focused on solving the problem in the first place. It's just as much of a problem that the system doesn't fail if you do the wrong thing. This means that a perfectly innocent mistake leaves a real problem in the code.

What normally ends up happening is that the unvalidated input data comes into the program and travels through the code until it is used in one or more unsafe operations. At some point, either someone notices the unsafe usage or a bug occurs. The maintenance programmer then probably applies some validation at the point of use and commits the fix. If this input is used in multiple places, this ad hoc validation will eventually end up in multiple places in the code. More importantly, the validation in different places will likely end up slightly different because it's very likely that the maintenance programmer is just looking at their problem, not all of the ways this data could be misused.

User Name

Let's say we have a program that expects to take a user name (matching a Unix account) from the user and use it to perform some operations. Some of these operations involve changing permissions to the account. Some operations involve accessing configuration files or data directories named after the user account. If the user enters the name root they could possibly gain unintended access to the system. If they entered xyzzy and that failed to be a user, how does the code handle the bad user. What if the supplied user name was fred/../bianca. Would that allow fred to access files belonging to bianca?

I'm sure you can see that validating these names is important. The subtle problem is that at the point of use a programmer might validate these names completely differently. I have seen cases where the first problem would have been validated by checking for user names that exist on the system, and the second would have been cleaned by removing any / or . characters. Over time, these patches proliferate and soon no one knows the right way to validate the incoming user name.

If instead we had used a UserName Value object from the beginning, validation would have been done as part of the creation of the object. More importantly, all code taking a user name should expect a UserName instead of a raw string. This means that we know the data has been verified because the object could not have been created otherwise. Just as importantly, any time you were tempted to use a raw string, the interface of the code would remind you to do the right thing. Moreover, if we decided at some point that the validation needs to be stronger, there is one, right place to make that change.

Week Day

Let's say you have a program that does reporting based on day of the week. If the user enters a weekday, can you assume it is valid? What should we do with the following user input?

Monday
tue
friday
thurs
Thu
yesterday
weduary
foo

Which of these do we want to consider legal? Should we have a canonical form for each name? Should we require the user to enter that canonical form? Can we use the current context to resolve something like yesterday? How about days from other languages?

If we don't deal with this up front, everywhere you use the weekday value will make it's own decisions. Even a concept as simple as this one has subtle effects.

Representation Changes

Another problem can occur from how we represent a value internally. If we use a primitive data type and we realize that our requirements now require a different representation, how painful will it be to change that representation everywhere? There's also a problem with the primitive representation not being a perfect match to the concept of the value we are working with.

Money

Let's say you are working on a banking application and you need to support deposits and withdrawals from an account. What happens if someone withdraws -1000.00 from their account, does that increase their balance? Before you scoff at this one, this exact issue has been reported repeatedly in the time I've been a programmer. Once again, the original programmer did not consider the possibility because you couldn't do that at a teller window.

A more subtle problem comes about when you realize that money cannot reasonably be represented by a floating point number. In particular 0.01 has no valid binary representation. While we can fixup the value of a single money value, after addition or multiplication, we can lose or gain value without meaning to.

This means that validation is not the only problem. We may need to represent money in a particular way to get the benefits of penny-level accuracy. Both people and banks would want this. If we had implemented a Money Value class up front, we can change the representation in that one place and the correct behavior happens everywhere.

Design Benefits

If you still aren't convinced of the usefulness of Value objects, there's a design-level benefit to creating Value objects. The Value you are working with is actually a concept in your design. If it weren't a concept, you wouldn't have a name for it. Defining a Class that represents that concept, even if it's just a Value class, gives context to anyone maintaining your code.

Past experience has also shown that any behavior that wrapped the primitive value will now have an obvious place to live. You may see code that existed in other classes (but didn't really fit) migrate to the Value class. This may change a simple Value class into a more complete class over time.

So, in some cases, a Value class can help you discover design that was hidden in your implementation. Refactoring the design to move behavior to the new ex-Value class then improves the overall design. But without the Value class in the first place, these refactorings may not be nearly as obvious.

Conclusion

The Value Object pattern is sadly underused. One really good use of Value objects is in validation of input. All input from the outside world should be used to instantiate Value objects. The Value objects themselves validate the input on construction. Anywhere in the code you find a Value Object being represented by a primitive data type is likely a bug waiting to happen. Finally, a Value class is a relatively light-weight way to represent a concept. It may serve as an attractor for behavior and become a full class over time.

Posted by GWade at 09:59 PM. Email comments

October 01, 2014

BPGB: Pattern Mania

This is yet another in the Best Practices Gone Bad series of posts.

Around two decades ago, the book Design Patterns (often called the Gang of Four, or GoF, book) was published. This book introduced a large number of programmers to a new approach to thinking about solutions.

For those who are not familiar with Design Patterns, you can think of a pattern as a high-level description of a generalized solution to a standard problem. A pattern has a standard name, a problem, a solution, side effects, and a list of other patterns that work with it.

Pattern Advantages

Many programmers focused on the 23 solutions to real world problems that the book provided. For some programmers, these solutions supplied answers to problems that they had struggled with. Some of the solutions covered problems that the programmers had not even realized they needed to solve.

However, one of the most useful features of patterns is the language and names they define. Most of the patterns were actually solutions many people had used at one point in time or another. Before the Design Patterns book, two developers from two different projects would likely name almost identical solutions differently. These two developers might not able to talk intelligently about their solutions if they did not realize that they were implementing the same solution. After Design Patterns, it became easier to recognize similar solutions because we had standard names.

If the developer actually read the Design Patterns book, their solution may even be better because the book lists consequences of a pattern. This should give the developer more information to improve their design. In particular, the developer can make certain that the defined consequences do not cause problems with the rest of the system design.

Pattern Over-Use

As expected, when people first realized the power of Design Patterns, they began to use them everywhere. This seems to happen pretty much anytime a developer discovers a new technique or paradigm. There was a running joke for a while about novices trying to fit all 23 patterns from the book into a Hello World program.

Right about the same time that Design Patterns was taking off, Java began to become popular. Many Java programmers incorporated Design Patterns into their fundamental understanding of OOP and Java best practice. Thanks to Sun's marketing muscle, Java got a lot of air time around business people, which resulted in a number of University Computer Science programs choosing Java as the first language for CS students. These three items caused many junior programmers to get the idea that a program that uses patterns is a good program and one that doesn't is a bad program.

Unfortunately, this ignores the fact that the original purpose of Design Patterns was to describe common solutions. It explicitly explained the problem that a pattern was intended to solve. Unfortunately, many of these junior programmers assume that their program will need to have a Factory or Visitor (for example) before even considering if the problem calls for any such thing. This leads to a joke I've seen recently on-line:

I had a problem, so I decided to use Java. Now, I have a ProblemFactory.

At this point, I need to make perfectly clear that the pattern over-use problem is not an inherent problem with the Java language. It's really a side effect of the two becoming popular at the same time and Universities adopting the language while the hype was high. Many senior Java programmers do not make this mistake. They carefully consider patterns just like any other important design decision. But, junior programmers sometimes make the mistake of starting their design by deciding which patterns to apply before they have determined what will be needed.

The Singleton

One pattern from the GoF that has been especially problematic has been the Singleton. Part of the problem is that the pattern describes a class with two responsibilities: object lifetime and object access. This is probably a valid criticism, since the other patterns in the book definitely are more single responsibility solutions.

The bigger issue, that has raised the ire of many programmers, is that an inexperienced programmer may use the Singleton pattern to legitimize what is nothing more than global data. Global variables have been recognized for decades as an ongoing source of problems in code. If the variable can be changed, you are pretty much guaranteeing action at a distance bugs where one piece of code changes the global state and another depends on the old value. (Note that this issue only applies if the object's state is mutable. If you can't change the object's internal state or affect it's behavior, there's no action at a distance problem.)

Honestly turning the GlobalKitchenSink object into the GlobalKitchenSinkSingleton does not improve code. But many inexperienced programmers feel that, since the Singleton was blessed by the GoF, global data expressed as a Singleton is fine. Since these programmers have likely never maintained a system using global data or dealt with the troubleshooting problems, they misuse this pattern just like they would have used the global data. Since they don't have the scars of fighting global data, these inexperienced programmers embedded Singletons (sometimes God Object Singletons) in their code. And they feel justified, because Singleton is a blessed pattern.

A secondary issue is the coupling between many pieces of code in the system and this Singleton. Testing is harder because classes depend directly on a Singleton object that we can't control. We can't easily subclass the Singleton to provide more control or functionality, because every place it's used, the name of the class is used directly. This introduces a high degree of coupling into the system that may not be necessary.

Conclusion

When the Design Patterns book came out, some developers believed that having good names, terminology, and examples would help junior programmers do much better design. Unfortunately, like all good ideas, patterns turned out not to be the silver bullet we were looking for. Since some people decided that design patterns meant good design, they immediately concluded that more design patterns must mean even better design.

Posted by GWade at 03:00 PM. Email comments