Programmer Musings: The Semantics of Garbage Collection

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

June 10, 2004

The Semantics of Garbage Collection

I have a confession to make. I've never really been comfortable with garbage collection.

Now, before you dismiss me as a Luddite who has obviously never worked with an advanced programming language, let me explain.

I'm familiar with many different forms of garbage collection. I have worked in languages with and without garbage collection. I've worked with Java, Lisp, and Perl; each with its own concept and implementation of garbage collection. I have seen well-done garbage collection free programmers from the drudgery of dealing with their own memory management.

But, despite all of that, I don't really like garbage collection. Strangely enough, when I finally examined why I didn't like garbage collection, I realized that my problem was more semantic than anything else. I don't like the name and what it implies. I don't think of used memory as garbage to be collected. I think of it as a resource to be recovered. The funny thing is that this shift in viewpoint results in a whole new set of assumptions about how resource recovery should work.

Timeliness of Recovery

The first consequence of this change in terminology is a change in timeliness.

Garbage is something we don't want to think about. Taking out the garbage is something we tend to put off until we have to deal with it. (The trash can is full, it's starting to smell, others in the house are complaining, etc.) This philosophy shows itself in the current crop of garbage collection systems. Most of the garbage collectors I've seen follow some variation of the mark and sweep algorithm. In this algorithm, a background task walks through memory marking all referenced objects. Then, it sweeps out the unused objects, freeing up their memory.^*

Unfortunately, running this algorithm uses CPU cycles that we obviously want to make use of elsewhere in our code. So most systems put it on a low priority thread; one that only gets called when there is nothing else to do (or when we begin running low on memory). Since this thread is just picking up our garbage, it doesn't need to be high priority. As long as the garbage is picked up before it gets in our way, we don't really care.

If you think of memory as a resource, things shift subtly. Resources are not things I'm ignoring, they are something I want. In fact, if it is a scarce (but renewable) resource, I want to make sure it is recycled or recovered as soon as possible. That way it will be available the next time I need it. It is no longer an issue of trying to put off this nasty chore off as long as possible. Now I want to try to handle it as soon as possible, without adversely impacting my other work.

Other Resources

Another advantage of thinking of memory as a resource, is that this puts it in the same category as other resources that we need to deal with. Most operating systems have limits on the numbers of certain resources. If a program tries to use too many without returning them to the system, it risks failure. Some kinds of scarce resources include:

File handles
Database handles
TCP/IP sockets
Process slots
Address space
Mutexes and semaphores
Window handles
Communications bandwidth

But, for some reason, we tend to treat memory as fundamentally different than these other resources. We tend to try to close/release each of the items in the list above as soon as we can. Because each is scarce and we know we will need it later. But, the current wisdom in garbage collection is to recover memory only when we have no choice or nothing better to do.

I find this approach disagrees with me on a gut-level.

*Okay, it usually involves copying the live objects around and doing some other maintenance work. But the simple description is enough for the purposes of the discussion.

Posted by GWade at June 10, 2004 09:00 PM. Email comments