This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.
The comments from Ian and rlb3 have made me think a bit more on what I said last time about Chronistic Coupling. One thing I didn't make perfectly clear is that I'm not advocating avoiding Chronistic Coupling at all costs.
Any real system will require some amount of Chronistic coupling. The key design point is to decide how much. Choosing the wrong level of coupling will certainly impact how your system evolves in the future. Over the next few posts, I'm going to explore some of these levels of Chronistic coupling with some examples.
Once upon a time, people doing communication between two processes (or computers) regularly debated how the data should be transferred: ASCII or binary. (This was pre-Unicode.) The advocates of the binary approach argued that it was more efficient for two reasons:
When we transferred data at 1200 or 2400 bps these arguments were pretty convincing. Especially when communicating between processes on the same machine.
However, there were problems when communicating between machines that were not the same architecture. When crossing the architecture boundary, you had to do conversions anyway. Some places where the binary format might change include:
Soon, a sizable amount of effort could be applied to converting binary data from other machines to the native format. The worst part about this was the lack of information in the data stream to help troubleshoot problems. Normally, you found out that your decoding logic was wrong when some portion of the binary data stream gave ridiculous results, or when you got to the end of the stream and found you had too little or too much data.
Meanwhile, text-based protocols sent more data over the wire (which became less of a problem as networks became faster). But, where a text-based protocol really shines is in debugging the data stream. If the next number in the stream is 1000000 and you expected a 16-bit short int, it's easy to see there's a problem. In a binary stream, the first two bytes of a long int look the same as an actual short int, there's no way to tell (at the protocol level) that something is wrong.
There were still problems. There was the EBCDIC vs. ASCII issue, which has mostly gone away. There is also the line ending problem, (LF vs, CRLF vs. CR).
The biggest win for the text-based protocols was the success of TCP/IP protocols on the network. A large number of the protocols that run the Internet are basically text. For example, HTTP, SMTP, FTP, Telnet, and more are basically a series of text strings sent between the client and server.
The major solutions to the size issue are relatively straight-forward. First, the networks got faster, so the problem is less of an issue. In places where bandwidth is still a problem, we can compress the text stream (gzip) to reduce the number of bytes. Since the compression is something that can be used by everyone, it is been greatly optimized over the years giving more benefit to everyone.
As a result of the (possibly compressed) text-based protocols used on the net today, machines with very different architectures can communicate easily. Text protocols have a lower chronistic coupling than binary protocols. An email client written to work on 16-bit Windows 3.1 could send messages to a client on a 32-bit Windows XP system. A web page served from a 64-bit Linux box can be viewed comfortably on Mac OS X, Windows Vista, or a mobile phone. More importantly, these clients don't need to know if the web page was generated from a C++ program, Ruby, Java, Lisp, or even Forth. It just doesn't matter.
Our video and audio formats are still binary because of the large amount of data being transferred. We still have chronistic coupling issues there. If you don't have the right codec for the file, you are basically out of luck. Many of these codecs are tied directly to the architecture where they were written.
In this case, the trade-off for reduced size is still more important than the ease of porting to multiple architectures.
Many programs have a need to store program state to disk at various points. An approach used by many of these programs is to serialize the objects representing the program state directly to disk (or a database). Back in 2004 (XML-Serialized Objects and Coupling), I described a coupling problem caused by automatically serializing objects to XML.
Since that time, I have worked with other systems with similar functionality and have decided the problem was worse than I described five years ago. Serializing an object to disk with the intent of reading it in at a later date, couples the structure of the object from a past date to the structure of the object at a future date. If the object never changes form, that is not a problem. If the object structure needs to change, then the serialization process becomes more complicated. It has to take one of three forms:
Recently, I have begun calling this effect Chronistic Coupling. (I like Temporal Coupling better, but that name is already taken.) Although you might think of this as another manifestation of Data Coupling, I think the time element makes Chronistic Coupling stronger (and more subtle) than data coupling. Unlike simple data coupling, object serialization couples the object structure through time. The older object format reaches forward in time to effect how the new program can structure its data.
If we allow saving in old formats, we must be very careful not to introduce an anachronism. This would be an old-style object that is inconsistent with the old program. This can cause problems that are hard to troubleshoot. You have to be able to identify where the old data came from to determine the problem. (In one system I worked on, we augmented the version of the data set with an extra piece of data describing the version of the program that saved this data.)
There is a sort of seductive quality to the idea that we can serialize objects and reinstantiate them at another time. This pattern recurs many times in the field of programming. Although it seems like a really good idea to have the data completely encapsulated by the object by serializing and deserializing the data straight to storage and back, the reality is there are still tradeoffs.
The obvious issue is to be certain that the data we read in is consistent with the design of the object. Most serialization needs to be augmented with some form of validation.
A separate issue that people often don't notice is that changes in the responsibilities and structure of an object can be hampered by Chronistic Coupling. At the very least, the code needed to deserialize old objects becomes much more complicated. In the worst case, it may be necessary to keep older classes in the design for the sole purpose of allowing us to convert old object into new object.
Where things really start to go bad is when a substantial portion of an object hierarchy changes. The object you have serialized may not bear any resemblance to the new classes. If the new object hierarchy is different enough, you would have to parse the old serialized object into a neutral format that can be used to instantiate the new objects, Either that, or you don't make the design improvements, because the work is too great (for this release).
In this way, the old design reaches into the future to prevent changes to the design. Often, the only way to fix the problem is to abandon backwards compatibility. This may result in major problems for clients or the need to provide special utility software to convert old data to a new format.
I am not saying that object serialization should always be avoided. The purpose of coining the term Chronistic Coupling is to give name to a cost that you may not realize that you are paying. In some cases, it might be better to store data in an object-neutral format and build new objects to represent the data rather than store the objects themselves. The unfortunate part of this is that there is no magical way to convert your objects to and from this simpler format.
No software can exist without some forms of coupling. However, some of the best minds in our field remind us to reduce coupling where we can. If you decide to use object serialization, remember that you are increasing coupling in the time dimension. It is important to consider whether or not this increased coupling is worth the cost.
Long ago, I was trying to convince a friend of mine that Object Oriented programming was not all just snake oil when he asked me a fundamental question.
What's the difference between an object and a thingie?
In some ways, this question has guided my understanding of objects ever since. Fundamentally, what makes one collection of member (instance) data and member functions (methods) an object and another nothing more than a collection of data and code? What is the fundamental nature of an object?
In one sense, the answer can be summed up with my favorite quote from Ruminations on C++:
use classes to represent concepts
In a broader sense, objects are all about abstraction. Most of programming, and OO programming in particular, is an exercise in abstraction. We want to separate what you need to know to perform some action from the details you don't need to know. Abstraction is the name we give for selectively hiding or ignoring the details we don't care about so we can focus on what really matters. Abstraction is what allows us to work with files instead of magnetic domains arranged in tracks on spinning platters on a hard drive.
Any time you give a simple name to a complex collection of behaviors, you have created an abstraction. But, not all abstractions are created equal. A collection of random pieces of data and methods in a FooLib class is not a particularly good abstraction. Yes, it collects together information under a single name. Unfortunately, the simplest translation of that name is the source code. In order to understand any piece of the functionality, you need to go look at how it's implemented.
A simple, good abstraction is a stack class. There is an independent concept in software of a stack. You don't need to understand the actual implementation details and internal data. All you need is to know about the push and pop methods. A few other methods might be added for looking at the top of the stack without removing an item and for determining the number of items in the stack. However, calling the class Stack brings along a bunch of expected behavior without need of explanation.
One of the greatest benefits of the whole design patterns movement was good names and definitions that can be used as high-level abstractions. You don't need to know about the implementation to know that an Iterator allows traversal of a container, or that a Factory creates other objects. In fact, by giving a complicated concept a simple name, we have performed a kind of compression.
When I call an object an adapter, you immediately know that its purpose is to convert the interface of a class into a match different interface. You also know something about expected costs of this delegation and that the adapter itself doesn't need to provide any major functionality of its own. You also know that it is likely that the adapted class either cannot be changed, or that changing it would affect too many other systems. It is also likely that we are using this older class in a new interface.
But, I don't need to say all of that, I just say the class is an adapter. That is a fair amount of compression, reducing a whole paragraph into one word.
A good abstraction provides compression of a lot of information into a single concept. Part of the compression involves the amount of work or added information needed to decompress the information. As a friend of mine once pointed out: ISBN is a really strong compression algorithm. Any book can be compressed into a 10-character string; but decompression is a bummer.
Decompressing a good abstraction to gain understanding requires some amount of additional information. If this information is general (like design patterns), you can reuse the explanation many times, reducing the cost of the decompression for each use of that pattern. If the only explanation for what the class does is the source of the class itself, there is not much abstraction. This is more like the ISBN example. To understand what ISBN: 0-596-51004-7 expands to, you need to get and read the book (Beautiful Code).
One way to recognize a good abstraction is to examine the level of compression (including the amount of information needed to decompress). If the only way to understand the abstraction is to read the source (and re-read the source, ...), odds are the abstraction is not very good.
If understanding a particular class requires a bunch of extra information that happens to be part of the business domain, we may still have a good abstraction. In that case, the extra information may be able to be amortized across several other classes.
Abstraction as information compression may be a useful concept for determining if any of your classes are actually thingies.
Continuing the line of thought from last time (Sharp Tools vs. Frameworks), another issue I see in quite a few frameworks and some systems is a code anti-pattern I'll call The Modular Monolith.
We all know that modularity is a good thing to have in a system. Modular code, in general, reduces coupling between components, allows easier reuse, and simplifies understanding. When done correctly, each module can be analyzed, tested, and understood independently of most of the rest of the system. In object oriented programming, the smallest module we work with is the class. Usually a group of classes work together as a subsystem (or package). In other paradigms, the smallest module might be a library.
In any case, we are all pretty much familiar with the benefits and concepts of modularity. At the present time, it would probably be hard to find anyone that does not accept that modularity is a good design principle.
In some systems or frameworks, you may run into a problem using a single class. When you try to include the class, you find dependencies on other classes. In some cases, this is perfectly reasonable. If the class you are including requires some low-level utility classes to do its work, that's understandable. But sometimes, you find that the class depends on other classes at the same level of complexity. If some of those classes depend on other classes that depend on other classes, you can eventually reach the point of needing the entire framework (or system) to use any part of it.
In this case, we no longer have modular code, we have a monolith. The oxymoron modular monolith refers to the fact that the code is modular in the sense that there are modules. The problem is that the modules are so tightly coupled together that they might as well be a single monolithic stone. No piece can be used without bringing the whole structure along for the ride.
One mechanism that can cause this problem is the over-use of Singletons. The Singleton, by its nature, can cause hidden coupling between the classes that use the Singleton and the classes the Singleton uses. This is one reason why the test-infected are usually against the use of the Singleton pattern. A system with multiple Singletons can result in connections that are almost impossible to unravel.
Another cause of this increased coupling is low-level classes that depend on high-level classes. This violation of layers is almost guaranteed to generate blobs of classes that must always be used as a unit. Many systems cause this problem through trying to connect error-reporting to low-level code. As the error reporting becomes more advanced, it brings in subsystems unrelated to the purpose of the utility class.
Frameworks aren't the only place you can see this anti-pattern. I've also run into this problem with systems written by people who have only worked in one system. The idea that code could be used outside the system never occurs to them. You can often recognize this situation with people who load up the multi-megabyte widget processing system as part of a piece of code to count the lines in a text file. Since they have always worked on this system, they treat it as the whole programming universe. All code must exist in the system.
The bad news is that these systems seem to be built on the modular monolith and the modular monolith further reinforces the attitude that everything must be done as part of the one system. This positive feedback makes stopping the behavior in either case almost impossible.
Over the last few years, I have spent a lot of time talking to lots of different people about object-oriented programming (OOP). I have spent a fair portion of the last three years interviewing and screening people over the phone for development positions. This activity has caused me to spend some time re-evaluating what I know about objects.
Some of the people I talk to say that OOP is about reuse. They say the main thing we get out of objects is reuse. We can reuse the data and functionality from a class by deriving from it. We get more reusable code by packaging it up in classes. Unfortunately, reuse is not confined to OO. Back when I was doing structured programming, we captured reusable code in functions. We used libraries and modules for larger granularity reuse. Obviously, OO is not the only way to reuse code. So, it's kind of hard to claim that the main reason for OO is code reuse.
So, if reuse is not the purpose for objects, what is? One of the maintenance benefits of OO has to do with the concept of encapsulation. If a class is defined reasonably, the only way to access its data is through its member functions (methods). The practical result of this is the amount of code that can change a piece of data is limited. In a large system, this drastically reduces the amount of code that must be examined to troubleshoot a data problem.
OO provides good support for abstraction. It is possible to make an abstract interface without OO, but it requires more discipline on the part of the programmer and the clients of the library. With classes, it is easier to specify an abstraction and encourage clients of your class to use it.
These two concepts are different aspects of the same issue: controlling complexity. By reducing the number of methods that can touch a given piece of data, you are reducing the communication paths in your code. This reduces complexity by introducing constraints in the way data is accessed and modified. Abstraction also reduces complexity by encouraging the client programmer to focus on the concept of the class rather than on its implementation.
In some cases, the complexity isn't actually removed, but only quarantined inside the class. This helps keep complexity inside the class from leaking out, and complexity from the surrounding system from leaking into the class. In many cases, reducing the way complex things interact is the best tool we have for managing complexity in our systems.
By providing interfaces that hide implementation details behind some form of abstraction, we reduce the complexity that the programmer needs to be aware of at any given point in time.
This also highlights one of the points where OO has failed in its promises. Without careful thought about your designs, an OO system can add to the complexity in a system. Several kinds of OO design decisions can add complexity to a system:
Another dark side of objects is that hiding the complexity of the system allows us to develop even more complex software because we don't have to deal with all of the complexity all of the time. This has caused many people to ignore the complexity cost of their designs.
People who really understand that OO is about reducing complexity avoid adding one more method to a class to make it more reusable. Classes with only one responsibility abstract the implementation details while providing a lower complexity interface to the functionality. Combine two classes inappropriately and the complexity increases again, because you need to know more to use the two halves of this split-personality class.
Keeping the importance of reducing complexity in mind should help in the creation of cohesive classes. Using classes to provide coherent interfaces that hide complex implementation details should help to reduce the complexity in our software.
Several times in the last few years, I have written about the subject of memory management, garbage collection, and object lifetime. Some of essays I've written on this subject include:
Recently, I was thinking about this issue again and had a slightly different insight into my favorite argument about garbage collection. I have often said that one of the things I don't like about the standard mark-and-sweep approach to GC is that it usually disregards object lifetime. I've normally seen the loss of defined object lifetime is a problem with GC. I've argued before that this loss removes a very important tool from the programmer's toolbox.
However, I recently began to consider another aspect of object lifetime that I had previously overlooked. Even in a language that supports GC, being aware of object lifetime is still worthwhile.
In languages, like C++, that support defined construction and destruction of objects, the lifetime of an object is directly connected to the point when the constructor and destructor is called by the runtime system. A good C++ programmer should think how an object should be created and destroyed. The programmer uses object lifetime to determine what needs to happen at the beginning and end of an object's life.
Defining the length of time the object lives also requires a little thought. If the object only needs to live a short period of time in one function, a stack object is appropriate. In other cases, a smart pointer can be used to constrain the lifetime of an object in well-defined ways.
A good C++ programmer thinks carefully about object lifetime and as a consequence has little problem with memory leaks. If the C++ programmer does not think about object lifetime issues, memory leaks abound.
Memory leaks are not supposed to be possible in GCed languages. However, I have seen several cases of Java programs that leaked memory. In one sense, these are not really leaks, since the memory is referenced somewhere. But, in another sense, they are leaks because the memory usage of the program is growing unexpectedly. In many of these cases, the problem is that the programmer has forgotten about the object and did not tell the language to forget the object.
Even in a GCed language like Java, memory leaks are caused by not thinking about object lifetime. If the programmer attaches an object to a longer-lived object or container and forgets to remove it when the object is no longer needed, the object has effectively leaked. By not properly discarding references when the program is finished with them, the programmer has generated memory leaks just as surely as if the language did not support GC. It's not an issue for short-lived objects created and discarded within a short stretch of code. But, then leaks aren't possible with stack-based objects either.
Whether your language of choice supports GC or not, thinking about object lifetime will help you avoid memory problems. It may also help you avoid leaking other resources by making certain you are clear about when the resources are needed.
Several newer languages (as in less than 10 years old) have been designed around the idea that everything is an object. Since I started programming professionally almost two decades ago, I was programming before the object craze became mainstream. I have done the OO thing for over a decade and am not as impressed as I once was.
Although not everyone will agree with me, I suspect that OO is not always the best choice. In fact, in some applications, a simple, straight-forward procedural approach is better.
Many OO advocates will point out that the procedural approach doesn't scale. They tend to forget that a large fraction of the code written in the world doesn't need to deal with terabytes or petabytes of data. Much code isn't called thousands of times a second. Even more code is not mission critical. And some applications run on systems without the resources for the overhead of OO.
I also find that the object approach appears to be harder for people to really master. Many people claim to have learned OO programming in a few weeks, but it seems to take at least two years for people to really get it. Before that point, new OO programmers end up creating collections of data and methods with no cohesion or multiple classes with intimate knowledge of each other's inner workings. It usually even takes a few months before they stop inheriting everything just because they can. I won't even discuss the objects that serve only as a holder of a handful procedural programming methods that show no signs of abstraction.
On the other hand, I've seen non-programmers pick up a little bit of procedural programming relatively quickly. Just enough to automate some portion of their computer usage and save a little time. This is some of the appeal of many of the dynamic languages. It's easier to whip out a little code to get a job done. Would I recommend this kind of programming for life- or mission-critical applications? No, but that doesn't make it useless.
I suspect that to objects have become the latest Golden Hammer in our field. It has also been around long enough that people have worked in the OO paradigm long enough that they have begun to believe that there is nothing else.
As such, people tend to forget that the is overhead involved with using OO. I'm not just talking about memory and CPU overhead. While it is true those forms of overhead exist, both are getting cheaper fast enough that they are not the major problem (unless you are doing embedded systems). The more important overhead is conceptual. Everyone starts with the obvious (and incorrect premise) that programming objects are just like objects in the real world. This view of objects is easy to understand, but very limited. Amusingly enough, in the real world it is obvious that everything is not an object. Energy, thoughts, emotion, and probability are fundamentally different than a table or cup. Forcing them into the same framework would not make much sense.
In most of the OO programming I've have done or seen only a tiny fraction of the classes have anything to do with real physical objects. I think the best breakthrough for me was the comment concepts are classes
in the book Ruminations on C++ by Koenig and Moo. This finally solidified for me the non-real world things that need to classes. I had done this many times without really having a reason. Wrapping your mind around these concepts and learning to work with them seems to take months to years for every programmer I've ever worked with, taught, or mentored. Along the way, they normally reach several Aha! moments that move them toward real ability with OO.
This does not sound like the best approach for people who are just beginning to learn to program or for people who only need to program as a sideline to get their real work done.
Even in the most fanatical of OO programming languages, we eventually reach the level of individual statements. These are not objects, and making them objects would increase the difficulty in understanding. However, most people overlook that.
I have come to believe that the everything is an object meme has nothing to do with ease of learning or use of the language. I suspect that the approach only simplifies things for the language designer. After all, as a user of the language, how does having an integer be an object really make your coding or design easier. In order to make the code usable, these integer objects either need special syntactic support in order for us to write mathematical expressions somewhat naturally, or the language needs support for operators built in to the classes. Both complicate the language for the (marginal) benefit of being able to say that all of my numbers are objects, just like everything else.
As a rule, I find this assertion to be a quirk of a given language rather than a particular bonus; kind of like Lisp's parenthesis, C++'s templates, or Perl's punctuation variables. They are part of the language and are only good or bad in as much as they help or hinder my ability to learn the language and read and write useful code.
What does it mean for an object to die? In C++, there are several distinct and well-defined stages in the death of an object. Other languages do this a little differently, but the general concepts remain the same.
This is the basic chain of events for an item on the stack.
Once an object goes out of scope it begins the process of dying. The first step in that process is calling the object's destructor. (To simplify the discussion, we will ignore the destructors of any ancestor classes.) The destructor should undo anything done by the object's constructor. Finally, after all of the destruction of the object is completed, the system gets an opportunity to recover the memory taken by the object.
In some other languages, a garbage collection system handles recovering memory. Some systems guarantee destruction when the object leaves scope, even with automatic garbage collection. However, some of them focus so hard on memory recovery that they provide no guarantees about when, or even if, destruction of the object will occur.
Although many people pay a lot of attention to the memory recovery part of this process, it seems to be the least interesting part of the process to me. The destruction of the object often plays a vital role in the lifetime of the object. This destruction often involves releasing resources acquired by the object. Sometimes, memory is the only thing to be cleaned up, but many times other resources must be released. Some examples include
These are all issues that we would like to take care of as soon as possible. Also, they result in some consequence if the cleanup step is forgotten or missed.
Anytime I have a resource that must be initialized or acquired and shutdown or released, I immediately think of a class that wraps that functionality in the constructor and destructor. This pattern is often known as resource acquisition is initialization. Following this pattern gives you an easy way to tell when the resource is yours. Your ownership of the resource corresponds to the lifetime of the object. You can't forget to clean up, it is done automatically by the destruction of the object. Most importantly, the resource is even cleaned up in the face of exceptions.
In the systems where destruction may be postponed indefinitely, this very useful concept of object death and the related concept of object lifetime is discarded.
When talking about Object Oriented Programming, there are several principles that are normally associated with the paradigm: polymorphism, inheritance, encapsulation, etc.
I feel that people tend to forget the first, most important principle of OO: object lifetime. One of the first things that struck me when I was learning OO programming in C++ over a decade ago, was something very simple. Constructors build objects and destructors clean them up. This seems obvious, but like many obvious concepts, it has subtleties that make it worth studying.
In an class with well-done constructors, you can rely on something very important. If the object is constructed it is valid. This means that you generally don't have to do a lot of grunt work to make sure the object is set up properly before you start using it. If you've only worked with well-done objects, this point may not be obvious. Those of us who programmed before OO got popular remember the redundant validation code that needed to go in a lot of places to make certain that our data structures were set up properly.
Since that time, I have seen many systems where the programmers forgot this basic guarantee. Every time this guarantee is violated in the class, all of the client programmers who use this class have a lot more work on their hands.
I'm talking about the kind of class where you must call an initialise method or a series of set methods on the object immediately after construction, otherwise you aren't guaranteed useful or reliable results. Among other things, these kinds of objects are very hard for new programmers to understand. After all, what is actually required to be set up before the object is valid? There's almost no way to tell, short of reading all of the source of the class and many of the places where it is used.
What tends to happen in these cases is the new client programmer copies code from somewhere else that works and tweaks it to do what he/she needs it to do. This form of voodoo programming is one of the things that OO was supposed to protect us from. Where this really begins to hurt is when a change must be made to the class to add some form of initialisation, how are you going to fix all of the client code written with it. Granted, modern IDEs can make some of this a little easier, but the point is that I, as the client of the class, will need to change the usage of the object possibly many times if the class implementation changes.
That being said, it is still possible to do some forms of lazy initialisation that save time at construction time. But, the guarantee must still apply for a good class. After construction, the object must be valid and usable. If it's not, you don't have an object, you have a mass of data and behavior.
The other end of the object's lifetime is handled by a destructor. When an object reaches the end of it's life, the destructor is called undoing any work done by the constructor. In the case of objects that hold resources, the destructor returns those resources to the system. Usually, the resource is memory. But, sometimes there are other resources, such as files, database handles, semaphores, mutexes, etc.
If the object is not properly destroyed, then the object may not be accessible, but it doesn't really die. Instead, it becomes kind of an undead object. It haunts the memory and resource space of the process until recovered by the death of the whole process. I know, it's a little corny. But, I kind of like the imagery.
This concept also explains one of the problems I have with some forms of garbage collection. Garbage collection tends to assume that the only thing associated with an object is memory. And, as long as the memory is returned before you need it again, it doesn't really matter when the object dies. This means that we will have many of these undead objects in the system at any time. They are not really alive, but not yet fully dead. In some cases, you are not even guaranteed that the destructor, or finalizer will be called. As a result, the client programmer has to do all of the end of object clean up explicitly. This once again encourages voodoo programming as we have to copy the shutdown code from usage to usage throughout the system.
So keep in mind the importance of the lifetime of your objects. This is a fundamental feature of object oriented programming that simplifies the use of your classes, and increases their usefulness.