Programmer Musings: Abstraction as Compression

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

April 26, 2008

Abstraction as Compression

Long ago, I was trying to convince a friend of mine that Object Oriented programming was not all just snake oil when he asked me a fundamental question.

What's the difference between an object and a thingie?

In some ways, this question has guided my understanding of objects ever since. Fundamentally, what makes one collection of member (instance) data and member functions (methods) an object and another nothing more than a collection of data and code? What is the fundamental nature of an object?

In one sense, the answer can be summed up with my favorite quote from Ruminations on C++:

use classes to represent concepts

In a broader sense, objects are all about abstraction. Most of programming, and OO programming in particular, is an exercise in abstraction. We want to separate what you need to know to perform some action from the details you don't need to know. Abstraction is the name we give for selectively hiding or ignoring the details we don't care about so we can focus on what really matters. Abstraction is what allows us to work with files instead of magnetic domains arranged in tracks on spinning platters on a hard drive.

Any time you give a simple name to a complex collection of behaviors, you have created an abstraction. But, not all abstractions are created equal. A collection of random pieces of data and methods in a FooLib class is not a particularly good abstraction. Yes, it collects together information under a single name. Unfortunately, the simplest translation of that name is the source code. In order to understand any piece of the functionality, you need to go look at how it's implemented.

Good Abstractions

A simple, good abstraction is a stack class. There is an independent concept in software of a stack. You don't need to understand the actual implementation details and internal data. All you need is to know about the push and pop methods. A few other methods might be added for looking at the top of the stack without removing an item and for determining the number of items in the stack. However, calling the class Stack brings along a bunch of expected behavior without need of explanation.

One of the greatest benefits of the whole design patterns movement was good names and definitions that can be used as high-level abstractions. You don't need to know about the implementation to know that an Iterator allows traversal of a container, or that a Factory creates other objects. In fact, by giving a complicated concept a simple name, we have performed a kind of compression.

Information Compression

When I call an object an adapter, you immediately know that its purpose is to convert the interface of a class into a match different interface. You also know something about expected costs of this delegation and that the adapter itself doesn't need to provide any major functionality of its own. You also know that it is likely that the adapted class either cannot be changed, or that changing it would affect too many other systems. It is also likely that we are using this older class in a new interface.

But, I don't need to say all of that, I just say the class is an adapter. That is a fair amount of compression, reducing a whole paragraph into one word.

A good abstraction provides compression of a lot of information into a single concept. Part of the compression involves the amount of work or added information needed to decompress the information. As a friend of mine once pointed out: ISBN is a really strong compression algorithm. Any book can be compressed into a 10-character string; but decompression is a bummer.

Decompressing a good abstraction to gain understanding requires some amount of additional information. If this information is general (like design patterns), you can reuse the explanation many times, reducing the cost of the decompression for each use of that pattern. If the only explanation for what the class does is the source of the class itself, there is not much abstraction. This is more like the ISBN example. To understand what ISBN: 0-596-51004-7 expands to, you need to get and read the book (Beautiful Code).

Good Abstractions

One way to recognize a good abstraction is to examine the level of compression (including the amount of information needed to decompress). If the only way to understand the abstraction is to read the source (and re-read the source, ...), odds are the abstraction is not very good.

If understanding a particular class requires a bunch of extra information that happens to be part of the business domain, we may still have a good abstraction. In that case, the extra information may be able to be amortized across several other classes.

Abstraction as information compression may be a useful concept for determining if any of your classes are actually thingies.

Posted by GWade at April 26, 2008 12:39 PM. Email comments