This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

March 14, 2009

Chronistic Coupling, Communications

The comments from Ian and rlb3 have made me think a bit more on what I said last time about Chronistic Coupling. One thing I didn't make perfectly clear is that I'm not advocating avoiding Chronistic Coupling at all costs.

Any real system will require some amount of Chronistic coupling. The key design point is to decide how much. Choosing the wrong level of coupling will certainly impact how your system evolves in the future. Over the next few posts, I'm going to explore some of these levels of Chronistic coupling with some examples.

Communications Protocols

Once upon a time, people doing communication between two processes (or computers) regularly debated how the data should be transferred: ASCII or binary. (This was pre-Unicode.) The advocates of the binary approach argued that it was more efficient for two reasons:

  • Fewer bytes sent over the network
  • No time spent converting to a network format and back

When we transferred data at 1200 or 2400 bps these arguments were pretty convincing. Especially when communicating between processes on the same machine.

However, there were problems when communicating between machines that were not the same architecture. When crossing the architecture boundary, you had to do conversions anyway. Some places where the binary format might change include:

  • Byte order
  • Size of primitive data types
  • Format of floating point data storage
  • Padding in larger binary structures (structs, etc.)
  • Encoding of strings (nul-terminated, length, etc.)

Soon, a sizable amount of effort could be applied to converting binary data from other machines to the native format. The worst part about this was the lack of information in the data stream to help troubleshoot problems. Normally, you found out that your decoding logic was wrong when some portion of the binary data stream gave ridiculous results, or when you got to the end of the stream and found you had too little or too much data.

Meanwhile, text-based protocols sent more data over the wire (which became less of a problem as networks became faster). But, where a text-based protocol really shines is in debugging the data stream. If the next number in the stream is 1000000 and you expected a 16-bit short int, it's easy to see there's a problem. In a binary stream, the first two bytes of a long int look the same as an actual short int, there's no way to tell (at the protocol level) that something is wrong.

There were still problems. There was the EBCDIC vs. ASCII issue, which has mostly gone away. There is also the line ending problem, (LF vs, CRLF vs. CR).

The biggest win for the text-based protocols was the success of TCP/IP protocols on the network. A large number of the protocols that run the Internet are basically text. For example, HTTP, SMTP, FTP, Telnet, and more are basically a series of text strings sent between the client and server.

The major solutions to the size issue are relatively straight-forward. First, the networks got faster, so the problem is less of an issue. In places where bandwidth is still a problem, we can compress the text stream (gzip) to reduce the number of bytes. Since the compression is something that can be used by everyone, it is been greatly optimized over the years giving more benefit to everyone.

The Present

As a result of the (possibly compressed) text-based protocols used on the net today, machines with very different architectures can communicate easily. Text protocols have a lower chronistic coupling than binary protocols. An email client written to work on 16-bit Windows 3.1 could send messages to a client on a 32-bit Windows XP system. A web page served from a 64-bit Linux box can be viewed comfortably on Mac OS X, Windows Vista, or a mobile phone. More importantly, these clients don't need to know if the web page was generated from a C++ program, Ruby, Java, Lisp, or even Forth. It just doesn't matter.

Our video and audio formats are still binary because of the large amount of data being transferred. We still have chronistic coupling issues there. If you don't have the right codec for the file, you are basically out of luck. Many of these codecs are tied directly to the architecture where they were written.

In this case, the trade-off for reduced size is still more important than the ease of porting to multiple architectures.

Posted by GWade at 11:29 PM. Email comments

February 08, 2009

Serialized Objects and Chronistic Coupling

Many programs have a need to store program state to disk at various points. An approach used by many of these programs is to serialize the objects representing the program state directly to disk (or a database). Back in 2004 (XML-Serialized Objects and Coupling), I described a coupling problem caused by automatically serializing objects to XML.

Since that time, I have worked with other systems with similar functionality and have decided the problem was worse than I described five years ago. Serializing an object to disk with the intent of reading it in at a later date, couples the structure of the object from a past date to the structure of the object at a future date. If the object never changes form, that is not a problem. If the object structure needs to change, then the serialization process becomes more complicated. It has to take one of three forms:

  1. Convert the object to and from the old format.
  2. Recognize the old object and transform it into the new structure.
  3. Institute a versioning system that allows reading and writing the current format and older formats.

Chronistic Coupling

Recently, I have begun calling this effect Chronistic Coupling. (I like Temporal Coupling better, but that name is already taken.) Although you might think of this as another manifestation of Data Coupling, I think the time element makes Chronistic Coupling stronger (and more subtle) than data coupling. Unlike simple data coupling, object serialization couples the object structure through time. The older object format reaches forward in time to effect how the new program can structure its data.

If we allow saving in old formats, we must be very careful not to introduce an anachronism. This would be an old-style object that is inconsistent with the old program. This can cause problems that are hard to troubleshoot. You have to be able to identify where the old data came from to determine the problem. (In one system I worked on, we augmented the version of the data set with an extra piece of data describing the version of the program that saved this data.)

Costs of Chronistic Coupling

There is a sort of seductive quality to the idea that we can serialize objects and reinstantiate them at another time. This pattern recurs many times in the field of programming. Although it seems like a really good idea to have the data completely encapsulated by the object by serializing and deserializing the data straight to storage and back, the reality is there are still tradeoffs.

The obvious issue is to be certain that the data we read in is consistent with the design of the object. Most serialization needs to be augmented with some form of validation.

A separate issue that people often don't notice is that changes in the responsibilities and structure of an object can be hampered by Chronistic Coupling. At the very least, the code needed to deserialize old objects becomes much more complicated. In the worst case, it may be necessary to keep older classes in the design for the sole purpose of allowing us to convert old object into new object.

Where things really start to go bad is when a substantial portion of an object hierarchy changes. The object you have serialized may not bear any resemblance to the new classes. If the new object hierarchy is different enough, you would have to parse the old serialized object into a neutral format that can be used to instantiate the new objects, Either that, or you don't make the design improvements, because the work is too great (for this release).

In this way, the old design reaches into the future to prevent changes to the design. Often, the only way to fix the problem is to abandon backwards compatibility. This may result in major problems for clients or the need to provide special utility software to convert old data to a new format.

Conclusion

I am not saying that object serialization should always be avoided. The purpose of coining the term Chronistic Coupling is to give name to a cost that you may not realize that you are paying. In some cases, it might be better to store data in an object-neutral format and build new objects to represent the data rather than store the objects themselves. The unfortunate part of this is that there is no magical way to convert your objects to and from this simpler format.

No software can exist without some forms of coupling. However, some of the best minds in our field remind us to reduce coupling where we can. If you decide to use object serialization, remember that you are increasing coupling in the time dimension. It is important to consider whether or not this increased coupling is worth the cost.

Posted by GWade at 10:10 PM. Email comments | Comments (2)

August 31, 2008

"Shortcomings" of the Web

I was reading The JavaScript Phrasebook on Safari and ran across one of my favorite pet peeves when people write about Web technologies.

Cookies are not a specific browser technology, but a mechanism in the client/server model to overcome one major shortcoming in the HTTP protocol. HTTP is stateless, which means that the protocol does not have a memory.

There are many people writing about web-based technologies that really appear not to understand the design of the web. Statelessness is not a shortcoming, it is a feature. Contrary to what these people apparently think, the designers of HTTP did not forget to include state in the protocol. Most client/server protocols before that point either kept state as part of the protocol or relied on a persistent connection for the session (which is just state at the TCP level). The default approach for HTTP would have been to do the same. The designers of HTTP explicitly chose not to keep state in the protocol.

One of the reasons that the web exploded was the fact that the HTTP protocol is stateless. The average web server does not need to retain connection information on every browser request that it receives. This made developing early web applications and specialized servers much easier. No need for storage to track browser sessions. No need to keep open connections or determine when to expire old sessions. No need for complexity that has nothing to do with serving individual pages to a browser.

Part of the genius of the minimal approach taken by HTTP is the fact that stateful applications can be built on top of it through mechanisms like cookies. (There were more approaches at the beginning. Cookies eventually won out.) The great thing is that systems that don't need stateful connections don't pay for them in complexity and bandwidth. For systems that need the extra state, the complexity and extra bandwidth is part of doing business.

HTTP is a relatively minimal protocol with mime-headers as an extension mechanism. This design decision has proven itself as being an extremely robust and stable protocol. The original version was implement sometime around 1990. RFC1945 described version 1.0 in 1996. Since that time, we have had one new version 1.1. This protocol has been stable for almost 20 years and underlies every request over the web.

Despite the success of this design, some people still find the need to point out shortcomings in a design they don't understand.

Because of all of this I remain amused when someone, who obviously doesn't understand the concept of a minimal design, refers to that design as falling short.

Posted by GWade at 11:10 AM. Email comments