This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

February 08, 2009

Serialized Objects and Chronistic Coupling

Many programs have a need to store program state to disk at various points. An approach used by many of these programs is to serialize the objects representing the program state directly to disk (or a database). Back in 2004 (XML-Serialized Objects and Coupling), I described a coupling problem caused by automatically serializing objects to XML.

Since that time, I have worked with other systems with similar functionality and have decided the problem was worse than I described five years ago. Serializing an object to disk with the intent of reading it in at a later date, couples the structure of the object from a past date to the structure of the object at a future date. If the object never changes form, that is not a problem. If the object structure needs to change, then the serialization process becomes more complicated. It has to take one of three forms:

  1. Convert the object to and from the old format.
  2. Recognize the old object and transform it into the new structure.
  3. Institute a versioning system that allows reading and writing the current format and older formats.

Chronistic Coupling

Recently, I have begun calling this effect Chronistic Coupling. (I like Temporal Coupling better, but that name is already taken.) Although you might think of this as another manifestation of Data Coupling, I think the time element makes Chronistic Coupling stronger (and more subtle) than data coupling. Unlike simple data coupling, object serialization couples the object structure through time. The older object format reaches forward in time to effect how the new program can structure its data.

If we allow saving in old formats, we must be very careful not to introduce an anachronism. This would be an old-style object that is inconsistent with the old program. This can cause problems that are hard to troubleshoot. You have to be able to identify where the old data came from to determine the problem. (In one system I worked on, we augmented the version of the data set with an extra piece of data describing the version of the program that saved this data.)

Costs of Chronistic Coupling

There is a sort of seductive quality to the idea that we can serialize objects and reinstantiate them at another time. This pattern recurs many times in the field of programming. Although it seems like a really good idea to have the data completely encapsulated by the object by serializing and deserializing the data straight to storage and back, the reality is there are still tradeoffs.

The obvious issue is to be certain that the data we read in is consistent with the design of the object. Most serialization needs to be augmented with some form of validation.

A separate issue that people often don't notice is that changes in the responsibilities and structure of an object can be hampered by Chronistic Coupling. At the very least, the code needed to deserialize old objects becomes much more complicated. In the worst case, it may be necessary to keep older classes in the design for the sole purpose of allowing us to convert old object into new object.

Where things really start to go bad is when a substantial portion of an object hierarchy changes. The object you have serialized may not bear any resemblance to the new classes. If the new object hierarchy is different enough, you would have to parse the old serialized object into a neutral format that can be used to instantiate the new objects, Either that, or you don't make the design improvements, because the work is too great (for this release).

In this way, the old design reaches into the future to prevent changes to the design. Often, the only way to fix the problem is to abandon backwards compatibility. This may result in major problems for clients or the need to provide special utility software to convert old data to a new format.

Conclusion

I am not saying that object serialization should always be avoided. The purpose of coining the term Chronistic Coupling is to give name to a cost that you may not realize that you are paying. In some cases, it might be better to store data in an object-neutral format and build new objects to represent the data rather than store the objects themselves. The unfortunate part of this is that there is no magical way to convert your objects to and from this simpler format.

No software can exist without some forms of coupling. However, some of the best minds in our field remind us to reduce coupling where we can. If you decide to use object serialization, remember that you are increasing coupling in the time dimension. It is important to consider whether or not this increased coupling is worth the cost.

Posted by GWade at February 8, 2009 10:10 PM. Email comments | TrackBack
Comments

Interesting post GW – “Chronistic Coupling” is an excellent name for a specific recurring problem. One that targets a problem I am currently grappling with. Specifically, the original programmer serialized his data to xml. Upon close inspection of the data points, errors were found that required a few generations of modifications to fix the fundamental problems in the application. Eventually, the old format had to be abandoned entirely to fix one aspect of the problem – accuracy. Now a new dimension of the problem rears its head – speed and memory consumption. So the problem evolves and so must the solution.

In your article, you state “it might be better to store data in an object-neutral format”. The question this raises is “how do you define an object-neutral format?”. In the particular case I am dealing with, the reason the solution must evolve is that the original assumptions of what data should be tracked were flawed. I would assert that the design of an object neutral format is as elusive at best. At its worst, every thing about the known data would need to be present in the object neutral version of the data. The obvious path to this is giving the data consumer the phone book, when all he wants is the pizza page 

If I follow this problem/solution path far enough, I come to a scenario where there is a very large set of data, and an API between the consumer and “all that is known” -- But that would escape the scope of your article.

I believe your point was just to describe the phenomenon of “Chronistic Coupling” and to that end I believe you have succeeded.


- Ian

Posted by: Ian at February 11, 2009 12:16 PM

As you know, I've been thinking about this article for the last couple of weeks. And I believe you are right. That this type of coupling is something to watch for. But I think this problem extends past object serialization in to schema change. When you make a database schema change your true model objects may not change but you ORM may. And there is the same version problem when you have a multiple versions of
a product that use slightly different database schema. Even CSV files may have a version problem when someone changes the name field to fname and lname fields. I wonder if there are already patterns that deal with this type of coupling?

Posted by: rlb3 at March 8, 2009 04:38 PM
Post a comment









Remember personal info?