This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

January 31, 2004

XML-Serialized Objects and Coupling

Although the debate continues to rage between the XML as documents camp and the XML as data camp, it seems reasonable to believe that both styles are here to stay. I have noticed one trend in XML data that strikes me as déjà vu all over again. There seem to be a large number of tools for automatically generating XML from objects. Giving you the ability to serialize an object, send it somewhere and possibly reconstitute it elsewhere.

These tools seem to make an annoying chore much easier. But, I have to wonder. These tools simplify applying a particular solution. But, is it the right solution?

I started working in XML long before most of these tools were available. In the early days (a few years ago <grin/>), we worked out the serialisation by hand. You converted what needed conversion to XML and left the rest alone.

One problem I see with the current approach is an increased coupling between the initial object and the XML stream that comes from it. If you are guaranteed to have the same kind of object, in the same language, on both sides of your transfer, that might be an appropriate solution. But what if you don't have that guarantee? What if you are providing a service to someone else? What if you are providing an API over a network? (I didn't say a Web Service, because nowadays that implies a particular architecture.)

What happens as your service changes over time? Do you really want to change the whole interface because the object that generates it has been refactored? If not, then you either have to leave those objects alone, or drop the nice tool that helps you generate the XML.

Many years ago, before Web programming and before the mainstream use of OO languages, there was a simple concept in programming to describe this problem, coupling. Long ago we learned that inappropriate coupling was bad. The higher the coupling between two pieces of code the harder it is to change one of them without changing the other. The whole concept of interfaces is based around the idea of reducing coupling.

My problem with these tools and the approach they simplify is that they may be increasing coupling unnecessarily. If both ends of the system must have identical object layouts in order to use the tool, then you are locking clients of the service into your way of looking at things. This makes it much more difficult for other people to use the service in ways you hadn't planned for. In fact, it makes it more difficult for you to use it differently in a year than you see it now.

I built a Web Service a few years ago for use inside a company. This was before the proliferation of WSDL and UDDI. SOAP was still pretty new. We defined the system using XML over HTTP. We defined the XML to fit the data we needed to send, not the objects we generated it with. It was not perfect and we learned many things along the way. One of the more interesting things that came out of it was the fact that the generic XML could be consumed relatively easily by code written in with different technologies from ASP to Perl to Flash.

I think the next time I build something like this, I will definitely do a few things differently. But the serialised objects approach is one thing I probably won't do. I don't think the increased coupling is worth the temporary gain.

Posted by GWade at 11:32 AM. Email comments

January 30, 2004

Review of Book Darts

Although it's not a book and not directly related to programming, I am going to review one of the best reading tools I've ever seen, Book Darts.

If you read or deal with a lot of reference books, you need to look at these little wonders.

Several years ago, my wife bought me a pack of these little page markers on a lark. My reading has not been the same since. Basically, a Book Dart is just a thin piece of folded over bronze with one side shaped like a pointer. It slides onto a page in a book and point to an exact location on the page.

I have used them in two different modes. I keep a few for bookmarks in whatever book I'm currently reading. The main advantage over normal bookmarks is that I don't lose them. But, that is not the use that really makes them shine.

When you find a passage in a book that you want to come back to, place a book dart in position pointing exactly to the information you are interested in. Later, the passage is easy to find and the book dart is thin enough that you can close the book and not mar the paper. When I was in college, people used post-it notes with arrows drawn on them to point to an exact passage. But, the post-its looked ratty, didn't stay where you wanted them, and were not appropriate for long-term marking.

Book darts solve all of these problems. They are also inexpensive (Ten cents each in bulk). One warning, they are addictive. I have a large number of programming reference books with dozens of useful passages marked. (They also now come in different colors.)

Posted by GWade at 11:46 PM. Email comments

Review of SVG Essentials

SVG Essentials
J. David Eisenberg
O'Reilly, 2002

A little over a year ago I bought and read SVG Essentials by J. David Eisenberg. At the time, I was moving from dabbling in SVG to beginning a contract which required the use of SVG.

This book provides a good working overview of SVG. I have seen several articles that showed how to produce specific effects or that explored a piece of SVG functionality. I've also read the W3C specifications. This book provides the practical information you need to actually use SVG.

One interesting point about the book is it's lack of spectacular graphics. The author states that this is by intent. Most of the pictures only illustrate one, or a small number, of features at a time. He also states that he doesn't want the pictures to overwhelm the novice SVG user that would be discouraged by not being able to produce beautiful work equal to the art in some books.

If you need a refresher on vector graphics or if you want to explore this new XML application, this book is definitely recommended.

Posted by GWade at 11:10 PM. Email comments

More Thoughts About Debugging

Earlier in this weblog, I listed some of my basic troubleshooting rules. I thought it was probably time to come back and spend a little more time on this topic.

I plan to expand on the points I made earlier and add a few more thoughts along the way.

I left off the most important rule last time.

0. Reproduce the symptom

If you can't reproduce the symptom, you will never fix it. (Or, at least you will never be sure it's fixed.) This one is pretty basic, but I notice people forgetting it all of the time.

I also try to make the distinction between symptom and bug/problem at this point, because the bug itself is almost never visible. What you can see is a symptom of the actual problem. This distinction becomes much more important later.

1. Divide and conquer (always)

Almost every problem can be solved fastest through a divide and conquer approach. Try to "trap" the symptom in a smaller and smaller box so that you can see it. Finding a bug in a 10 line function is much easier than finding it in a 100,000 line program.

Different approaches to this technique include trying to generate smaller sets of actions needed to reproduce the problem. Adding logging statements or breakpoints to find code which is before and after the bug. Any code that you can eliminate reduces the space you have to search for the bug.

2. 50/50 tests are best

A friend of mine who has a lot of experience in information theory pointed this out. One of the good things about a 50/50 test is that whichever way the test goes, you learn the same amount. If a test can divide the program in half, and you can tell which half the bug is in, you now have half the code to check. If you can repeat this, a 1,000,000 line program can be reduced to 1 line in around 20 tests. But, all programmers should recognize 2^20. (Gives a whole new aspect to the game Twenty Questions doesn't it?) So each 50/50 test gives you about 1 bit of information.

Often when debugging, it's easy to fall into the trap of testing to see if the bug is in one small piece of code. This turns out to be a really bad idea. If your test would limit the bug to 5% of the program if it succeeds, how much will you learn (on average)? Well, if the test succeeds, you eliminate 95% of the code. If the test fails, you eliminate 5% of the code. In information theory terms, this test gains you about a quarter of a bit (on average). So if the test succeeds, it would localize the solution as well as 4 simple tests. But, if it fails you don't really know much more than you did to start with.

3. Verify can't happen cases with a test

Some of the most effective tests I know are tests for things that can't happen. The classic hardware test is to see if the device is turned on/plugged in.

At one point, I was told that some code had a problem because it took a lot longer to complete a remote process with one machine than it did with another. I suggested that we ping the two servers to make sure the network acted the same to both machines. I was told that the network was the same so it couldn't possibly be the problem. We tested it anyway. It turned out there was a significant difference in the ping times. The network group was then able to find and fix the configuration problem.

Many times, the "can't happen" case is actually the I don't believe it could do that case. This may mean that it really can't happen. But, it may indicate a blind spot for that problem. Blind spots are good places to check for bugs. If you can't see it now, it's possible the original programmer didn't see it when coding.

4. Steady progress is better than random guessing

One mistake that many new debuggers make is to try to guess at the bug, fix what they think they've found, and hope it works. A few years ago, I realized that there seem to be three main causes for this behavior:

  • False laziness
  • A desire to appear to understand what's happening
  • Imitation of more senior personnel

Not all programmers are driven by all of these causes. Some programmers are affected more strongly by some of these. The first is the easiest to spot. The programmer does not want to waste time finding and fixing bugs. "There's too much real work to do." Anyone who ever gets very good at programming eventually learns that debugging is part of the business.

Part of what makes a good programmer is ego. Larry Wall describes this as the great programmer virtue of Hubris. This is basically the belief that you can do it, in spite of evidence to the contrary. If it weren't for this kind of ego, no code would ever get written. Most systems are so complex, that if we ever really thought about what we are getting ourselves into, we would run screaming into the night. Fortunately, programmers do suffer from hubris.

However, the process of debugging on a system you don't understand is frustrating and humbling. You have no idea where in thousands of lines of code the problem lies. To keep from bruising the ego, the programmer will sometimes guess to appear to come to a swift conclusion. This approach usually fails and ends up making that programmer look worse.

If we go back to the divide and conquer approach with 50/50 tests, you can reduce the size of the problem to 1/1024 it's original size in about 10 tests. In 10 more tests, the area to search could be less than 1 in a million.

In actual fact, you don't usually get actual halving like that in a real problem. And you can usually spot the problem quite a ways before you get it down to one machine instruction. But the truth is that this kind of steady progress will narrow down to the bug more consistently than any other method. Moreover, it's more useful to be able to find any bug than to guess right on a few.

5. If you guess and fail, go back to #1.

Sometimes, when debugging, you recognize some symptoms or get a hunch about what the problem could be. If you can test it simply, go ahead. But, if your guess doesn't pan out, don't try to keep guessing. This is where many people go wrong in debugging. They spend a lot of time chasing spurious hunches, when they should be whittling down the problem space.

The fact that your hunch didn't find it probably means that you don't understand the problem as well as you thought. Don't despair, a few more simple tests may give you the information you need to make better guesses later. More likely, the tests will help you find the bug in a more systematic manner.

You've located the symptom, now what?

Finding the code that generates the symptom is not the hardest part of the problem. Now is the time to identify the real problem. How do you find the root cause? For example, say you have a program written in C++ that slowly increases its memory consumption. This is an obvious symptom of a classic memory leak. After a significant amount of effort, you find the area where the memory is allocated and not freed.

The quick fix is to slap in a delete at the right place and pat yourself on the back. But, you haven't really found the root of the problem. Why wasn't the memory freed at that time? Possibly, under some circumstance, that object is used elsewhere. Even if it isn't, the patch is not exception-safe. If an exception occurs before the delete, the memory leak is back. A better idea would be to use some form of smart pointer, like auto_ptr. (Before you get up and scream "garbage collection would have fixed that", I've seen Java programs that leaked memory like crazy. Garbage collection doesn't fix all memory problems.)

Well, you've fixed the bug. Now is a good time to look in the immediate area for similar mistakes. For many reasons, some pieces of code seem to collect more than their fair share of bugs. If you've found one, there are likely others nearby.

Posted by GWade at 10:59 PM. Email comments

January 23, 2004

SVG and CSS

In most of the SVG I've seen people either prefer to use the style attribute or set the individual style attributes. I don't see much use of CSS classes and I wonder why.

Most of the criticisms I've seen of the use of CSS fall into four categories:

  1. It's not XML.
  2. It's too verbose to use CSS on all elements.
  3. It's not as easily scripted or animated.
  4. It's inconsistently supported.

It's not XML

Let's take these one at a time. The first is one of my favorite non-arguments. I don't use XML for every piece of data in my life. For example, most of the words I type do not use characters that are explicitly marked up. Even more amazing, the individual digits of most numbers I use aren't marked up in XML, either.

All sarcasm aside, XML is a good format for some things and a lousy one for others. Sometimes raw text is better. Sometimes a comma-separated values (CSV) file is better. And, sometimes XML is best. So I don't consider this to be a useful argument.

It's too verbose to use CSS on all elements.

Well, that was true of CSS and HTML as well. In fact, I remember people using that as a reason to go ahead and use <font> tags in HTML. (Which were even more verbose.)

People with a bit more experience, or people that are too lazy to style everything explicitly (like me), often use CSS classes to solve that problem. Instead of a large number of individual style properties, you only need one class attribute. This also simplifies changing the look of many elements by modifying the class they are associated with.

It's not as easily scripted or animated.

This is often true. I believe there are viewers and libraries that allow you to modify CSS on the fly, but I don't know that the methods are consistent across tools. The key is to not use CSS for things that you want to be dynamic. In many cases, a large amount of the elements in the display don't change (or some of the styling is static even on elements that do change), style those with CSS classes or with direct styles. Put the things you plan to change in attributes.

In my experience, most of the things I animate or script, I do by changing either individual XML attributes or whole classes of styling. But, I don't usually directly modify styling information. In fact, changing one class attribute can result in a drastic change to an element, effectively modifying a large number of style properties all in one call.

The one downside, of course, is that you need to set up the CSS classes in advance.

It's inconsistently supported.

This is definitely true. I have run across lax CSS features in ASV3 which cause difficulty displaying SVG written for ASV3 on other viewers. But, from what I've heard and seen of newer viewers, that situation seems to be improving. Of course, if we use that excuse we need to stop working on the web. Support and rendering of HTML is still inconsistent. And, in the past, I remember people using invalid HTML to get the visual effects they wanted on certain browsers.

My experience

I've tried to use CSS classes in SVG for most of my own work, and I find it works quite well. I also use the style attribute to override the class values for a few elements.

Finally, if I need to do a lot of scripting or animation on an element I tend to rely on the styling attributes.

In fact each of these approaches has its own strengths and weaknesses. Playing with them allows you to develop a sense for when each could be the right tool for your particular job.

Posted by GWade at 11:31 PM. Email comments

January 20, 2004

More on Magic Constants

I've been thinking more on the issue of Magic Constants. Have you ever noticed that when some people first understand the idea of symbolic constants, they want to collect all of the constants they've defined together in one spot?

I remember a C project I worked on over ten years ago where the programmer had a single header file with every constant in the system defined in it. He also had a second header with every struct in the system declared in it. This was so "he would always know where they were."

Of course, this meant that any change to either of those headers meant the entire system needed to be recompiled. As a result, creation of new constants (which had to be in the constants header file) was strongly discouraged. This obviously encouraged the misuse of constants for new, unrelated purposes.

When I came to the project, there were already a large number of places where any given constant was used for two or more unrelated purposes, because it happened to have the right value. Some arrays and strings could not be resized because changing a constant would have broken unrelated code. Fixing most of those problems took months.

The funny thing is, I have continued to see this same pattern in almost every language I've worked in since. Why do people think that it is a good idea to build a single constants file? I've done it. Others have done it. Sometimes we do it even when we know better. I wonder if this is similar to the mental quirk that causes us to make kitchen junk drawers and Miscellaneous folders in filing cabinets.

Posted by GWade at 09:10 PM. Email comments

January 18, 2004

Magic Constants are bad

A truly bad code example in a book on Java Servlets got me thinking about the idea of Magic Constants. Of course, we are all aware of the problem of magic numbers or magic literals in code. That's why we use symbolic constants instead to hide that implementation detail (and to simplify maintenance later).

However, I'm not talking about literals, I'm talking about symbolic constants that are used in a way as bad, or worse, than the original literals.

The example in the book was looking at the response code from an HTTP request. If you are familiar with HTTP, you know that the values from 200 to 299 are the success codes. Now obviously, we don't want to put those raw literals in our code. So we shuold use symbolic constants instead.

The book contained the following code fragment:


if (status >= HttpURLConnection.HTTP_OK ||
status < HttpURLConnection.HTTP_MULT_CHOICE) {
...

One look at this code and I finally had a name for a bad practice I'd seen many times in my career. I decided on Magic Constants. In this case, the constants are used exactly the way the original literals would have been. HttpURLConnection.HTTP_OK has the value 200 and HttpURLConnection.HTTP_MULT_CHOICE has the value 300. To understand the code, you need to know that the first value above the successful codes is HttpURLConnection.HTTP_MULT_CHOICE.

This code relies on the relative ordering of the two constants in a way that is dependent on their current implementation. If W3C ever decided to change the range of successful values or move the Multiple Choices response code, code like this could suddenly act very differently.

Unfortunately, this code has a bug that would have been more obvious if we had kept the original literals. Without the constants the code is


if (status >= 200 || status < 300) {
...

From this, it's a little more obvious that the condition will always be true. The OR condition should have been an AND. So obviously, this practice has generated code that is easier to get wrong and more fragile as well.

Before, I go any farther, I'd like to say that I do not mean to abuse these authors in general. They just happened to write the piece of code that shows a practice I've come to believe is wrong. I have seen variants of this problem for most of the 20-odd years I've been programming.

I have seen many cases where someone borrowed a constant that happened to have the right value without regard for whether or not the new use of the constant and the old use had any relationship. This leads to code that is almost as bad as the original with the magic numbers. No one will ever be able to figure out why the array containing the task structures is sized by the constant NAME_LEN.

I might suggest two practices that could solve many of the Magic Constant mistakes I've seen.

  1. Constants should have only one purpose.
  2. A range of constants should have extra other constants to declare the range.

In the first case, we always give each new use of a number it's own constant. In the case above, NUMBER_TASKS should be separate from NAME_LEN. If there is a reason why their sizes are actually related, define one in terms of the other. The hard part is recognizing when the numbers are really distinct and when they are the same.

The second idea is a variant on the first. The constants in the range should not be used for the first and last items in the range. This is an idea that has only completely gelled just now. I've done part of this inconsistently for years, but I think I need to be more consistent. I've often defined a constant for the number of items in a range. For example, if we have a set of constants for column numbers, I might code them using C++ enums like this:


enum eColumns
{
colID, colName, colAddress, colEmail,
NUM_COLUMNS
};

By adding the final constant, I always have a simple way to iterate over the columns. However, this approach doesn't work so well if the first value isn't 0. Now I think a better approach would define a MIN_COLUMN and a MAX_COLUMN. This would allow me to loop from min to max. I could also define the number of items based on these two constants.

This would have been especially useful in the original problem. Let's assume I had two more constants:


public final int static HTTP_MIN_SUCCESS = 200;
public final int static HTTP_MAX_SUCCESS = 299;

This allows us to recode the test as


if (status >= HttpURLConnection.HTTP_MIN_SUCCESS &&
status <= HttpURLConnection.HTTP_MAX_SUCCESS) {
...

The original code in the book was repeated several times for different examples. A much better solution would be to define a new method, isSuccess() which performs this test and is coded in one place. The usage of the code would then have been


if (isSuccess( status )) {
...

which is much more readable and maintainable.

Now obviously the function which hides the implementation details of the success test is a better idea and should be available along with the constants. The extra constants are still a good idea though. At some point, a programmer may need to use this range in a way that the original programmers didn't anticipate.

Posted by GWade at 03:45 PM. Email comments

January 17, 2004

Regular Expression Maintainablity

perl.com: Maintaining Regular Expressions [Jan. 16, 2004]

Aaron Mackey does a wonderful job of suggesting more maintainable regular expression idioms. His use of (?{}) to assign the results of capturing parenthesis inline was particularly interesting to me. I may have seen it before, but this is the first time it made sense.

The article goes into some serious magic later including deferred execution of the (?{}) blocks and Regexp::Fields.

Posted by GWade at 06:09 PM. Email comments

Review of Slack

Slack
Tom DeMarco
Broadway Books, 2002

In Slack, Tom DeMarco takes a somewhat heretical position about how a business can succeed. He argues somewhat convincingly that companies need less efficiency and more slack in order to adapt to changes in their environment. DeMarco goes on to define slack as that period of time when you are 0% busy. While not as profound as Peopleware, this book is quite thought-provoking.

One real surprise for me was that, although I agreed with many of his points, I did not feel that he really proved his conclusion. He states that a lack of slack makes it harder to adapt to change, and then moves on. He states that the current fascination with efficiency is the cause of the lack of slack, and then he moves on. In some ways the book does more to make you think about the subject and then states a conclusion that he expects you to agree with.

Despite these complaints, Slack is a good book for anyone doing management or project management. It may not change your mind about the way companies should work. But, it does provide a different viewpoint

Posted by GWade at 12:39 PM. Email comments

Review of MySQL CookBook

MySQL CookBook
Paul DuBois
O'Reilly, 2003

This book covers a large amount of material on using MySQL. If you are new to MySQL (or SQL), this book could be a tremendous help. There are only two bad things I can say about this book. One is that it is huge. If you don't read books like this cover-to-cover that may not be a problem. The second is that not all of its recipes seem to follow the cookbook style. Although much better than some of the cookbook-style books, it is not the best of the style. That being said, most of the book does do a good job of picking problems and showing you how to solve them in proper cookbook style.

Posted by GWade at 08:43 AM. Email comments

January 16, 2004

Language Book Intros

In the past year, I've had to move my Java programming skills from recognize the language at twenty paces to professional Java programmer. In the process, I've been reading a number of books on the language. This has been my approach to learning every language I've ever worked with.

Almost all of the Java books have seemed to have a chapter or section in common that I haven't seen anywhere else. Does anyone know for certain if it is mandatory that every Java book has a bash the other languages chapter?

Maybe I've just had bad luck in picking books, but it does seem that almost every one that I have read has a chapter like this. They harp on obviously inherently insecure C, dangerous, convoluted C++, lowly scripting languages like Perl, and many other real or imagined flaws of other languages.

Now I do understand that programmers can become quite passionate about their favorite language. Ask almost any programmer about which language is best, or most powerful, and you can expect a lively discussion. But, I really don't recall this kind of diatribe in any other language books that I've read.

When I was first learning the C++ programming language (nine or ten years ago), some books devoted space to how C++ allowed for better abstractions and potentially more maintainable code than C. But, this information wasn't in every book and it was not an attack on C. It framed more as enhanced features for solving different kinds of problems.

When I was first learning the Perl programming language (over ten years ago), most of the books talked about ability to get work done and programmer efficiency. I do remember discussions of using Perl instead of combinations of AWK, SED, and shell scripting. But, I don't recall any attacks on other languages.

When I was learning the C programming language (over fifteen years ago), there was almost no mention of other languages in the books I read. There was a lot of talk of solving problems and a strong impression that you could solve any kind of program with C.

Even when I was learning the Forth programming language, there was a lot of talk in the books about the Forth way of solving problems, but other languages were not attacked.

The same hold true for every other computer language I have learned including Fortran, LISP, Basic, and x86 assembler. No books on any of these languages spent much time on the flaws of other languages, they focused on getting a job (or all jobs) done using this language.

One of my biggest gripes about this approach is the waste of space I end up paying for when I buy the book. If I'm buying a book on a particular programming language, I've already made the decision that I will be using the language (at least for the current project). At this point, I wish to learn syntax, idioms, tools, and approaches to solving problems with the language. I am not looking to be convinced that this language is the embodiment of the One, True Way to program.

I'm not looking for the One, True Way to program. I have many languages in my toolkit. I try to use the best one for each job.

Posted by GWade at 11:29 PM. Email comments

Unit tests that should fail

I was doing a little research on the Java JUnit test framework and ran across the article The Third State of your Binary JUnit Tests.

The author points out that in many test sets there are ignored tests as well as the passing and failing tests. As the author says, you may want to ignore tests that show bugs that you can't fix at this time. He makes a pretty good case for this concept.

The Perl Test::More framework takes a more flexible approach. In this framework you can also have skipped tests and todo tests in addition to tests that actually need to pass. These two different types of tests have very different meanings.

Skipped tests are tests that should not be run for some reason. Many times tests will be skipped that don't apply to a particular platform, or rely on an optional module for functionality. This allows the tests to be run if the conditions are right, but skipped if they would just generate spurious test failures.

Todo tests have a very different meaning. These tests describe the way functionaly should work, even if it doesn't at this time. The test is still executed. But, if the test fails, it is not treated as a failure. More interestingly, if a todo test passes, it is reported as a failure because the test was not expected to pass. This allows bugs and unfinished features to be tracked in the test suite with a reminder to update the tests when they are completed.

Unlike the idea in the referenced article, these two separate mechanisms don't ignore tests that cannot or should not pass. Instead, we can document two different types of non-passing tests and still monitor them for changes.

Posted by GWade at 12:58 PM. Email comments

January 15, 2004

Thread death

There is a subtle aspect of threads that may be worth exploring having to do with the concept of joining. Many threading systems support the possibility of waiting until a thread completes in order to retrieve its exit state. This is often started through a function called join(). In order to support this feature, a terminating thread may remain in memory until it is joined. This allows the join() function to work the same way (except without suspending) on a thread that is running or on one that has already finished.

Unfortunately, these dead threads consume system resources. At a minimum, they are taking up a spot in the thread maintenance data structures in the OS. They may also retain memory or other resources. In order to clean up these resources, the thread must be join()ed.

This can be inconvenient. In some cases, the threads are launched without any need to know how they exited. Some threading systems allow setting the state of a thread such that it is not joinable. This means that when it finishes executing, the thread will be discarded automatically.

A related issue has to do with exitting the program and running threads. In a multithreaded program, how does the system decide when to exit the process that contains these threads? There are basically three approaches:

  • The process exits when all threads exit.
  • The process exits when the main thread exits.
  • The process exits when all important threads exit.

The first can be inconvenient, even if it is easy to understand. The second is easy to mess up. For example, you can launch all of your worker threads in the main thread and then exit because the main thread has nothing left to do. Suddenly, the program is gone and no work got done. This can be solved by having the main thread wait for some signal from other threads that it is time to shut down.

The Java threading library takes a different approach. Normally, a program runs until all of its threads are dead. However, a thread can be marked as a daemon. In the Java terminology, the process exits when all non-daemon threads are dead. This allows one to choose any of the other two approaches or something else entirely.

The only problem I have with this approach is the name. A daemon thread or process has certain connotations on a Unix-like system. I don't think the fact the thread will not keep the program alive is the defining characteristic of a daemon.

Posted by GWade at 11:03 AM. Email comments

Worker thread patterns

Any system that uses the worker pattern may also want a dispatcher thread that wakes up workers and sends them on their way. In this approach, the dispatcher thread handles setting up the data needed to finish the task. The dispatcher may either choose a worker from a pool of suspended threads or create a new thread to perform the work. Many server programs use a similar approach. In that case, the dispatcher thread may be waiting on a socket and passes the request to a worker when the request is received.

Another approach would be to make all of the threads identical (no dispatcher thread). The worker threads could wait on a synchronized work queue object or socket. As each request is received, a thread is awakened by the OS and given control of the request. Although it sounds a little strange not to have control over which thread does what work, this can actually be a very efficient way to work.

This latter approach is called the team model in Tanenbaum's Modern Operating Systems.

Posted by GWade at 10:58 AM. Email comments

Threading Patterns

Possibility of thinking of multithreading design patterns as a way to organize threading code.

Some possible patterns include:

  • Fire and Forget (background thread)
  • Periodic (Timer)
  • IO thread
  • Producer/Consumer
  • Worker thread (work queue)
  • Copier thread
  • Pipeline (must be careful)
  • Swarm?

The background thread could be joined at a later time or a thread that runs it's course and dies.

We need a good term for the latter. Java calls them daemon threads. I don't think the term conjures the right connotations.

Posted by GWade at 10:55 AM. Email comments

Perl as a "dinosaur"?

Someone asked a question about updating SVG dynamically from a server on the SVG Developers list this morning. One of the (many) responses pointed to an article on SVG server-side.

As usual, I went to check it out to see if it might hold some tidbits I could use. Or maybe, it could be a resource I might recommend. While scanning the article, I found this little quote

Perl is the dinosaur among web scripting languages, its market share (when it comes to server side web scripting) getting smaller...

They go on to point out that in Perl you have to manually set the header.

I find it amusing that people that want to show the Perl programming language in a bad light ignore modules like CGI.pm that help to give Perl most of the support you need for web scripting. CGI.pm has been out there for a long, long time. It's been part of the standard distribution since version 5.004.

To give the authors credit, they do mention the SVG.pm Perl module for generating SVG using Perl objects.

Posted by GWade at 09:38 AM. Email comments

January 14, 2004

The Smite Class

In attempting to do Test Driven Development, we noticed that one of the problems with testing object validation code was the necessity to have broken objects to test with. This is particularly important in cases where the internals of an object may come from somewhere uncontrolled. For instance, objects that may be read from disk could be restored from a damaged file, resulting in objects that should not occur in normal practice.

In many cases, you would just generate an isValid() type method that could be used to detect the invalid condition and let the user of the object deal with the situation. The question remains, how do you validate the object validation code?

Obviously, you do not want to expose your private data to access from the outside world. You may not even want to expose it to your descendants. You certainly do not want to expose methods publicly that could be used to generate an invalid object. That defeats one of the purposes of having a class.

A Smite class is a derived class that accesses a protected interface and has the ability to damage an object or generate inconsistent state in the object. This class would not be part of the public hierarchy, but it would be available for testing.

You might ask why we called it the Smite class.

One definition of smite is To inflict a heavy blow on... It may also mean to kill.

The particular image I have of the smite is from an old Far Side cartoon that was labelled God at his computer. In the picture, God in the form of a bearded, white-haired old man is looking at a computer screen. On the display, some poor slob is walking down the sidewalk and is about to pass underneath a piano hanging from a rope. "God's" finger is poised over a key labelled smite.

A Smite class is a derived that can inflict heavy damage on the internal data of the object. When this object is used as it's base type, it should be damaged or inconsistent. This allows for testing of validation and/or recovery code.

Posted by GWade at 08:16 PM. Email comments

What is a programming idiom?

In a natural language, an idiom is a phrase that conveys more information than the individual words combined. In order to understand an idiom, you must have knowledge of the culture of the person using the idiom, as well as a good grasp of the language.

Programming idioms are similar. They involve uses of particular aspects of the language in ways that convey extra information to the reader of the code. Sometimes idioms are used to expand the abilities of a language. Other times, idioms are used to restrict what you can do with a construct. In all cases, however, the idiom relies on an understanding of the culture from which it is formed.

Posted by GWade at 08:08 PM. Email comments

Paradigms limit possible solutions

Different paradigms are basically different ways of thinking about or looking at problem solving. Programming paradigms usually also include ways of organizing code or design.

The main purpose of any paradigm is to reduce the number of possible solutions to a problem from infinity to a small enough number that you have a chance of picking one. Most proponents of a given paradigm argue strongly that their paradigm of choice removes more possible invalid solutions without seriously impacting valid solutions.

In actual fact, every paradigm eliminates both valid and invalid solutions to any given problem. This is not necessarily bad. However, it does mean that by choosing a particular paradigm, you are closing yourself off from protentially useful solutions or ways of approaching a problem.

Posted by GWade at 08:05 PM. Email comments

True Names

In some legends and in fantasy, there is the concept of a True Name. Once you know someone's true name, you have power over them. To guard against problems with this most people would commonly use a use name that everyone actually uses to refer to them to keep their true name safe.

In programming, True Names exist, even though people don't think of them as such. Programs, systems, routines, and concepts are all referred to in many ways. But of the many ways to refer to something, only one is the item's true name. An item's true name is the one that exposes it's essence in the simplest way possible.

One way to spot that you are not using a concept's true name is when you have to give explanation along with the name every time you describe it.

One of the major contributions of the book "Design Patterns" was to provide true names for several design techniques. This caused a large number of programmers and designers to wake up to the idea of the importance of a name for communication.

This hit me during a particularly sleep-deprived portion of my career.

Posted by GWade at 08:01 PM. Email comments

Basic troubleshooting rules

Here are a few basic rules of troubleshooting and debugging.

  1. Divide and conquer (always)
  2. 50/50 tests are best
  3. Verify can't happen cases with a test
  4. Steady progress is better than random guessing
  5. If you guess and fail, go back to #1. Don't try to keep guessing.

I've been trying to formulate a good set of troubleshooting and debugging rules for years. Ever since I was training entry-level programmers and realized that I couldn't always explain how I found a problem or spotted a bug.

Posted by GWade at 07:58 PM. Email comments

Welcome to my blog

I've been experimenting for a few months with blogging as a way to store URLs, interesting information, and ideas on my local machine in a way that I might be able to find them again.

I've finally decided that this approach would help me to do some things on my website that I haven't quite gotten around to doing. (Mostly out of false laziness.)

Stay tuned for programming book reviews, interesting (to me, at least) programming concepts, and who knows what else.

Posted by GWade at 07:48 PM. Email comments