This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

October 03, 2014

Value Objects

A few years ago, I first became aware of the concept of Value Objects. Unlike most objects you might use in a design, the identity of a value object is completely unimportant. Instead, the value of the object is all that is important. This leads to a couple of interesting design decisions:

  • Value objects tend to be simple
  • Value objects tend to be immutable

When we say that a Value Object is simple, we mean that it tends to have a single piece of state and very limited behavior. Often the only behavior is an accessor for the internal state. Making the object immutable means that it gets its value when it is constructed or initialized and the value of the object never changes. Depending on the kinds of Values, the behavior may allow the ability to combine this value with other data to make a new Value Object (think addition for a Money object).

Some programmers would be focused on the fact that we are providing a new class for a thin wrapper around a more primitive data type and would complain that this complicates design for no real benefit. This can easily become the Primitive Obsession Code Smell. The more I use Value objects, the more I believe they provide significant advantages to a design.

Validation

If you talk to people who deal with security for any sizable set of code, you will regularly run across the problem of input validation. Input from the user (or any outside source) is not validated nearly often enough or thoroughly enough. This has lead to numerous security breaches over time. Part of the reason input isn't validated is because the programmer is focused on getting the code to work and not thinking about a malicious user. As the programmer is testing, she normally supplies valid inputs and the code continues to work even without proper validation. A good programmer will also give a few bad examples, to verify that the system rejects them. Very few test deliberately perverse or malicious input.

Part of the problem is that best practices on validation do not help when the program are focused on solving the problem in the first place. It's just as much of a problem that the system doesn't fail if you do the wrong thing. This means that a perfectly innocent mistake leaves a real problem in the code.

What normally ends up happening is that the unvalidated input data comes into the program and travels through the code until it is used in one or more unsafe operations. At some point, either someone notices the unsafe usage or a bug occurs. The maintenance programmer then probably applies some validation at the point of use and commits the fix. If this input is used in multiple places, this ad hoc validation will eventually end up in multiple places in the code. More importantly, the validation in different places will likely end up slightly different because it's very likely that the maintenance programmer is just looking at their problem, not all of the ways this data could be misused.

User Name

Let's say we have a program that expects to take a user name (matching a Unix account) from the user and use it to perform some operations. Some of these operations involve changing permissions to the account. Some operations involve accessing configuration files or data directories named after the user account. If the user enters the name root they could possibly gain unintended access to the system. If they entered xyzzy and that failed to be a user, how does the code handle the bad user. What if the supplied user name was fred/../bianca. Would that allow fred to access files belonging to bianca?

I'm sure you can see that validating these names is important. The subtle problem is that at the point of use a programmer might validate these names completely differently. I have seen cases where the first problem would have been validated by checking for user names that exist on the system, and the second would have been cleaned by removing any / or . characters. Over time, these patches proliferate and soon no one knows the right way to validate the incoming user name.

If instead we had used a UserName Value object from the beginning, validation would have been done as part of the creation of the object. More importantly, all code taking a user name should expect a UserName instead of a raw string. This means that we know the data has been verified because the object could not have been created otherwise. Just as importantly, any time you were tempted to use a raw string, the interface of the code would remind you to do the right thing. Moreover, if we decided at some point that the validation needs to be stronger, there is one, right place to make that change.

Week Day

Let's say you have a program that does reporting based on day of the week. If the user enters a weekday, can you assume it is valid? What should we do with the following user input?

  • Monday
  • tue
  • friday
  • thurs
  • Thu
  • yesterday
  • weduary
  • foo

Which of these do we want to consider legal? Should we have a canonical form for each name? Should we require the user to enter that canonical form? Can we use the current context to resolve something like yesterday? How about days from other languages?

If we don't deal with this up front, everywhere you use the weekday value will make it's own decisions. Even a concept as simple as this one has subtle effects.

Representation Changes

Another problem can occur from how we represent a value internally. If we use a primitive data type and we realize that our requirements now require a different representation, how painful will it be to change that representation everywhere? There's also a problem with the primitive representation not being a perfect match to the concept of the value we are working with.

Money

Let's say you are working on a banking application and you need to support deposits and withdrawals from an account. What happens if someone withdraws -1000.00 from their account, does that increase their balance? Before you scoff at this one, this exact issue has been reported repeatedly in the time I've been a programmer. Once again, the original programmer did not consider the possibility because you couldn't do that at a teller window.

A more subtle problem comes about when you realize that money cannot reasonably be represented by a floating point number. In particular 0.01 has no valid binary representation. While we can fixup the value of a single money value, after addition or multiplication, we can lose or gain value without meaning to.

This means that validation is not the only problem. We may need to represent money in a particular way to get the benefits of penny-level accuracy. Both people and banks would want this. If we had implemented a Money Value class up front, we can change the representation in that one place and the correct behavior happens everywhere.

Design Benefits

If you still aren't convinced of the usefulness of Value objects, there's a design-level benefit to creating Value objects. The Value you are working with is actually a concept in your design. If it weren't a concept, you wouldn't have a name for it. Defining a Class that represents that concept, even if it's just a Value class, gives context to anyone maintaining your code.

Past experience has also shown that any behavior that wrapped the primitive value will now have an obvious place to live. You may see code that existed in other classes (but didn't really fit) migrate to the Value class. This may change a simple Value class into a more complete class over time.

So, in some cases, a Value class can help you discover design that was hidden in your implementation. Refactoring the design to move behavior to the new ex-Value class then improves the overall design. But without the Value class in the first place, these refactorings may not be nearly as obvious.

Conclusion

The Value Object pattern is sadly underused. One really good use of Value objects is in validation of input. All input from the outside world should be used to instantiate Value objects. The Value objects themselves validate the input on construction. Anywhere in the code you find a Value Object being represented by a primitive data type is likely a bug waiting to happen. Finally, a Value class is a relatively light-weight way to represent a concept. It may serve as an attractor for behavior and become a full class over time.

Posted by GWade at October 3, 2014 09:59 PM. Email comments