<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Programmer Musings</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/" />
    <link rel="self" type="application/atom+xml" href="http://anomaly.org/wade/blog/atom.xml" />
   <id>tag:anomaly.org,2008:/wade/blog//1</id>
    <link rel="service.post" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1" title="Programmer Musings" />
    <updated>2008-07-02T13:17:23Z</updated>
    <subtitle>General thoughts on the craft/art/science of programming and related topics.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.2</generator>
 
<entry>
    <title>Perl != Regular Expressions</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/07/perl_regular_expressions.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=184" title="Perl != Regular Expressions" />
    <id>tag:anomaly.org,2008:/wade/blog//1.184</id>
    
    <published>2008-07-02T13:15:47Z</published>
    <updated>2008-07-02T13:17:23Z</updated>
    
    <summary>Yesterday, I wrote an essay on a comment made in Jeff Atwood&apos;s Coding Horrors blog about regular expressions. While Atwood spends quite a bit of time on the two problems joke and talks passionately about regexes as a tool, he...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="Languages" />
            <category term="Perl" />
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>Yesterday, I wrote an <a href="http://anomaly.org/wade/blog/2008/07/regular_expressions_as_a_probl.html">essay</a> on a comment made in Jeff Atwood's Coding Horrors blog about regular expressions. While Atwood spends quite a bit of time on the <em>two problems</em> joke and talks passionately about <abbr title="regular expression">regex</abbr>es as a tool, he does take a swipe at Perl that seemed somewhat uncalled for.</p>

<p>I have seen Perl bashed by lots of people over the years.  Some have never written (or read) any Perl, but still feel qualified to bash the language. I've also been told that the only reason that I could possibly like Perl is if I had not written or maintained anything serious in Perl. When I point to large applications running in a 24/7 data centers that I worked on for years, they normally change the subject. Pointing to the number of financial institutions, research groups, and large corporations that depend on Perl to function is also illuminating.</p>

<h2 class="subhead">Regular Expressions</h2>

<p>I often see regular expression bashing and Perl bashing tied together. There seems to be this weird meme running around that <abbr title="regular expression">regex</abbr>es are the only tool in the Perl toolbox. Both Atwood and Zawinski (see <a href="http://www.codinghorror.com/blog/archives/001016.html">Atwood's post</a> that started this line of thought) seem to take the viewpoint that Perl is nothing but regexes or that Perl somehow forces you to to do everything through regexes.</p>

<p>Anyone who has worked with Perl for very long has seen that the language has very strong support for regex processing. This makes sense because the original goal of the language was text processing and regexes are made for text processing. For many, Perl was their first introduction to industrial strength regexes. Maybe it's not too surprising that they decided to overuse that powerful little tool.</p>

<p>Perl is also a general purpose language that supports <acronym title="Object Oriented Programming">OOP</acronym> as well as a procedural style. It also supports list processing and functional programming. There are modules on <a href="http://www.cpan.org/">CPAN</a> for controlling hardware, accessing databases, biology, and astronomy. There are also natural language processing modules, <acronym title="eXtensible Markup Language">XML</acronym> parsers, and web frameworks.</p>

<p>Perl is a powerful, flexible language that some of us use to get actual work done.</p>

<h2 class="subhead">Passionate About Perl</h2>

<p>As much as some people hate Perl and find it necessary to build up their language of choice by bashing Perl, some of us find the language to be a natural tool for many jobs. I have worked professionally in half a dozen general purpose languages over my career. But for solving a problem quickly, I normally turn to Perl. And I don't just mean for quick and dirty scripts. If I need to solve something in a short period of time and be sure it will work for years, I also often use Perl.</p>

<p>Bjarne Stroustrup once said:</p>

<blockquote>There are only two kinds of programming languages: those people always bitch about and those nobody uses.</blockquote>

<p>I would say that both C++ and Perl definitely fall into the first category.</p>

<p>People often pick on other languages. We've all heard the complaints (that are at best half true). Almost everyone picks on Basic for being a bit of a <em>kiddie language</em>. Cobol is the old-style business language <em>that rots the brain</em>. C++ is <em>too baroque</em>. Python is <em>obsessed with indentation</em>. Lisp has <em>too many parentheses</em>. Staticly typed languages are <em>too obsessive</em> and dynamic languages are either <em>too slow</em> or <em>too loose</em>.</p>

<p>Although the fans of each language might insult other languages in a friendly sort of way, mention Perl and the vitriol begins to fly. I know quite a few people who really like Perl and I know at least as many who truly hate it. As Kathy Sierra pointed out a few years ago <a href="http://headrush.typepad.com/creating_passionate_users/2004/12/if_some_people_.html">If some people don't HATE your product, it's mediocre.</a>.</p>

<p>Based on the level of hate that Perl seems to inspire, it's safe to say that the language is definitely not mediocre.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Regular Expressions as a Problem</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/07/regular_expressions_as_a_probl.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=183" title="Regular Expressions as a Problem" />
    <id>tag:anomaly.org,2008:/wade/blog//1.183</id>
    
    <published>2008-07-02T03:20:32Z</published>
    <updated>2008-07-02T03:28:34Z</updated>
    
    <summary>Jeff Atwood had an interesting post on regular expressions a few days ago. Most of the article talks passionately about the usefulness of regular expressions, strategies for writing readable regular expressions, and tools to help write and debug regular expressions....</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p><a href="http://www.codinghorror.com/">Jeff Atwood</a> had an interesting <a href="http://www.codinghorror.com/blog/archives/001016.html">post</a> on regular expressions a few days ago. Most of the article talks passionately about the usefulness of regular expressions, strategies for writing readable regular expressions, and tools to help write and debug regular expressions.</p>

<p>Along the way, Jeff points out a comment that usually pops up when the topic of regular expressions appears:</p>

<blockquote>Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. </blockquote>

<p>The post points to some interesting further research on the quote and then gets down to the meat of his regular expression advocacy. But not before taking a swipe at Perl with:</p>

<blockquote>Should you try to solve every problem you encounter with a regular expression? Well, no. Then you'd be writing Perl, and I'm not sure you need those kind of headaches.</blockquote>

<p>I know I shouldn't be, but I'm continually surprised by how many people really seem to dislike Perl.</p>

<h2 class="subhead">The Two Problems</h2>

<p>First off, I have to say that I do find the <em>two problems</em> quote to be amusing and somewhat clever (at least, the first time I heard it). According to Jamie Zawinski, who originated this version of the quote, he got it from another version before the used <em>sed</em> instead of regular expressions. Honestly, you could probably insert the name of any technology that you wanted to bash in that spot and get an equivalent quip.</p>

<p>Part of why the quote is funny is a real kernel of truth. Any time someone learns a new technology, they attempt to apply it to every problem that comes along. Doing so almost always results in an additional problem. (Using the wrong tool for the job.)</p>

<p>I would, however, like to to counter it with a question from a good friend and mentor, Rick Hoselton:</p>

<blockquote>What do you call a programmer with only one problem?</blockquote>

<blockquote>Unemployed.</blockquote>
]]>
        
    </content>
</entry>
<entry>
    <title>False Lazy Initialization</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/05/false_lazy_initialization.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=181" title="False Lazy Initialization" />
    <id>tag:anomaly.org,2008:/wade/blog//1.181</id>
    
    <published>2008-06-01T03:35:57Z</published>
    <updated>2008-06-01T03:37:41Z</updated>
    
    <summary>There is a technique I have seen used many times in my career called Lazy Initialization. The purpose of this technique is to delay an expensive initialization (or object construction, or calculation) until you actually need it. Just like any...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>There is a technique I have seen used many times in my career called <a href="http://en.wikipedia.org/wiki/Lazy_initialization">Lazy Initialization</a>. The purpose of this technique is to delay an expensive initialization (or object construction, or calculation) until you actually need it. Just like any technique, Lazy Initialization has both advantages and disadvantages.</p>

<p>The advantages are a pretty obvious:</p>

<ul>
   <li>If you never use the value, you don't pay for it.</li>
   <li>Initialization costs can be more spread out by not initializing everything at once.</li>
   <li>In some cases, startup appears to be faster.</li>
   <li>Ability to delay until needed information is available.</li>
</ul>

<p>The disadvantages are not always quite so obvious. In fact, they are often ignored.</p>

<ul>
   <li>Code needed to check to see if it is time to initialize the value each time it is used.</li>
   <li>Time spent in the above code each time the value is used.</li>
   <li>Overhead for maintaining the inputs for the initialization that may not be needed after the initialization.</li>
   <li>More complicated error recovery for dealing with failed initialization far from the point where the inputs were supplied.</li>
</ul>

<p>Like any technique, you need to weigh the advantages and disadvantages in context to decide if the technique is worth it. If the cost of initialization is low or if you are going to need the value almost immediately, Lazy Initialization is not a good idea. If the initialization process is expensive or time-consuming and you are unlikely to need the value at all, the technique is quite useful.</p>

<p>Done correctly, you should have a single function that does the check and initialize. Then, everywhere you need the value you call this function.</p>

<h2 class="subhead">False Laziness</h2>

<p>Unfortunately, just like the programming virtue of <em>Laziness</em> has a dark twin, <a href="http://c2.com/cgi/wiki?FalseLaziness">False Laziness</a>, Lazy Initialization also has a dark copy. I suggest we should call this <em>False Lazy Initialization</em>. Unlike proper Lazy Initialization, <em>false lazy initialization</em> does not check for the need to initialize the object everywhere the object is needed. Like <em>false laziness</em>, this method appears to save time and effort by only checking and initializing in one place. After all, "the object is initialized after this call, so we don't really need to check elsewhere". The argument sounds seductively correct, but is in fact, very wrong.</p>

<p>In reality, the order of calling these methods will end up different than predicted at some point in the use of the code. Once this happens, some piece of code will depend on the object being initialized when it hasn't been. A bug is discovered. Someone fixes it by calling the function doing the initialization at this spot. Later, the same bug pops up in another place. The same fix is applied. Somewhere else code trips over the uninitialized object again, this time we fix the code by testing if the object is initialized and fail if it is not.</p>

<p>Over time, these two approaches to <em>fixing</em> the problem proliferate until the original benefit is lost in the noise. To me, false laziness is the epitome of that old saying:</p>

<blockquote>There's never enough time to do it right, but there's always time to do it over.</blockquote>

<h2 class="subhead">True Lazy Initialization</h2>

<p>Just like <em>true laziness</em> involves extra work up front to save even more work later, proper lazy initialization requires a bit of extra work. Think of lazy initialization as an optimization. When optimizing code, we should not change the real functionality or expectations of how the code should function. We only want to improve performance. If we were not doing lazy initialization, the object in question would have been initialized at the beginning. Everywhere we access this object it would have already have been initialized. The order that methods involving the object are called should not matter.</p>

<p>In order to keep these conditions the same, every method that accesses the object must go through a single method that checks the object and initializes it hasn't been already. This causes a small cost for each access to the object, but saves us a huge amount of maintenance time later.</p>

<p>In an OO system, we might have an object with an internal member that is lazily initialized. The false version would have every (or almost every) method individually check to see if the member is initialized properly and construct the member when needed. An even worse version would only initialize the member in one method and require that all users of the code remember to use it in just the right order or risk failure.</p>

<p>Both of these false versions tend to be bad for maintenance as well. Any time the code needs a change, you have to go back and review this decision to make sure the initialization has been done correctly. In the end, false lazy initialization ends up costing more than non-lazy initialization would have.</p>

<p>Proper lazy initialization provides one method that is always used to access the lazy member. The code never bypasses this method to access the value directly. Although we always pay a small penalty for testing the value, the maintenance costs are lower and the robustness is higher.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Review of Beautiful Code</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/04/review_of_beautiful_code.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=180" title="Review of &lt;cite&gt;Beautiful Code&lt;/cite&gt;" />
    <id>tag:anomaly.org,2008:/wade/blog//1.180</id>
    
    <published>2008-04-30T03:32:04Z</published>
    <updated>2008-04-30T03:33:17Z</updated>
    
    <summary>Beautiful Code Edited by Andy Oram &amp; Greg Wilson O&apos;Reilly, 2007 Although I was really looking forward to reading Beautiful Code when I first heard it was coming out, I found myself a bit disappointed by the reality of the...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="Books" />
            <category term="CodeCraft" />
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p><cite>Beautiful Code</cite><br />
Edited by Andy Oram & Greg Wilson<br />
O'Reilly, 2007</p>

<p>Although I was really looking forward to reading <cite>Beautiful Code</cite> when I first heard it was coming out, I found myself a bit disappointed by the reality of the book.</p>

<p>Each chapter is written by a different <em>master coders</em>, giving their views on what makes code beautiful. Some of the essays did a very good job of explaining things that I agree are critical to beautiful code: readability, succinctness, clarity, etc. Other essays discussed qualities that I would not have called beautiful. Some of these included qualities from some of the ugliest code I have ever worked on. The fact that I agreed strongly with some chapters and disagreed just as strongly with others made the book hard to read at first.</p>

<p>Eventually, I realized that this was just another case of <em>beauty is in the eye of the beholder</em>. The qualities that each author praised simplified their lives in the context of the problems that they were solving. In that context, the code was beautiful. In another context, those same qualities would not be as good.</p>

<p>Although I disagreed with some of the authors, I think the overall message was a good one. Some authors praised relatively universal qualities. Others praised particular architectures as making a huge difference in the code. But, all of the authors provided some insight into what different professional programmers see when they look at code. The differences in viewpoints are probably as important as the individual essays.</p>

<p>Overall, I would recommend this book for strong intermediate to senior software developers. I think a junior programmer might not have enough experience to be able to recognize the difference between generally good qualities and qualities that are good in a particular context. Even for the intermediate to strong programmers, I would warn that some of the chapters are definitely harder going than others.</p>

<p>I really don't think this book is for everyone. If you are more interested in learning libraries in your language of choice for solving today's problems, this book won't be of any use. If you have work on enough projects, in enough languages, to really understand that the <em>one, true way</em> to develop software is a myth, then the various viewpoints in this book will help you think about what code beautiful to you.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Abstraction as Compression</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/04/abstraction_as_compression.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=179" title="Abstraction as Compression" />
    <id>tag:anomaly.org,2008:/wade/blog//1.179</id>
    
    <published>2008-04-26T18:39:18Z</published>
    <updated>2008-04-26T18:48:39Z</updated>
    
    <summary>Long ago, I was trying to convince a friend of mine that Object Oriented programming was not all just snake oil when he asked me a fundamental question. What&apos;s the difference between an object and a thingie? In some ways,...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
            <category term="Objects" />
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>Long ago, I was trying to convince a friend of mine that Object Oriented programming was not all just snake oil when he asked me a fundamental question.</p>

<blockquote>What's the difference between an <em>object</em> and a <em>thingie</em>?</blockquote>

<p>In some ways, this question has guided my understanding of objects ever since. Fundamentally, what makes one collection of member (instance) data and member functions (methods) an object and another nothing more than a collection of data and code? What is the fundamental nature of an <em>object</em>?</p>

<p>In one sense, the answer can be summed up with my favorite quote from <cite>Ruminations on C++</cite>:</p>

<blockquote>use classes to represent concepts</blockquote>

<p>In a broader sense, objects are all about <em>abstraction</em>. Most of programming, and OO programming in particular, is an exercise in abstraction. We want to separate what you need to know to perform some action from the details you don't need to know. Abstraction is the name we give for selectively hiding or ignoring the details we don't care about so we can focus on what really matters. Abstraction is what allows us to work with files instead of magnetic domains arranged in tracks on spinning platters on a hard drive.</p>

<p>Any time you give a simple name to a complex collection of behaviors, you have created an abstraction. But, not all abstractions are created equal. A collection of random pieces of data and methods in a <code>FooLib</code> class is not a particularly good abstraction. Yes, it collects together information under a single name. Unfortunately, the simplest translation of that name is the source code. In order to understand any piece of the functionality, you need to go look at how it's implemented.</p>

<h2 class="subhead">Good Abstractions</h2>

<p>A simple, good abstraction is a <em>stack</em> class. There is an independent concept in software of a stack. You don't need to understand the actual implementation details and internal data. All you need is to know about the <code>push</code> and <code>pop</code> methods. A few other methods might be added for looking at the top of the stack without removing an item and for determining the number of items in the stack. However, calling the class <code>Stack</code> brings along a bunch of expected behavior without need of explanation.</p>

<p>One of the greatest benefits of the whole <em>design patterns movement</em> was good names and definitions that can be used as high-level abstractions. You don't need to know about the implementation to know that an <em>Iterator</em> allows traversal of a container, or that a <em>Factory</em> creates other objects. In fact, by giving a complicated concept a simple name, we have performed a kind of compression.</p>

<h2 class="subhead">Information Compression</h2>

<p>When I call an object an <em>adapter</em>, you immediately know that its purpose is to convert the interface of a class into a match different interface. You also know something about expected costs of this delegation and that the adapter itself doesn't need to provide any major functionality of its own. You also know that it is likely that the adapted class either cannot be changed, or that changing it would affect too many other systems. It is also likely that we are using this older class in a new interface.</p>

<p>But, I don't need to say all of that, I just say the class is an adapter. That is a fair amount of compression, reducing a whole paragraph into one word.</p>

<p>A good abstraction provides compression of a lot of information into a single concept. Part of the compression involves the amount of work or added information needed to decompress the information. As a friend of mine once pointed out: ISBN is a really strong compression algorithm. Any book can be compressed into a 10-character string; but decompression is a bummer.</p>

<p>Decompressing a good abstraction to gain understanding requires some amount of additional information. If this information is general (like design patterns), you can reuse the explanation many times, reducing the <em>cost</em> of the decompression for each use of that pattern. If the only explanation for what the class does is the source of the class itself, there is not much abstraction. This is more like the ISBN example. To understand what ISBN: 0-596-51004-7 expands to, you need to get and read the book (<em>Beautiful Code</em>).</p>

<h2 subclass="subhead">Good Abstractions</h2>

<p>One way to recognize a good abstraction is to examine the level of compression (including the amount of information needed to decompress). If the only way to understand the abstraction is to read the source (and re-read the source, ...), odds are the abstraction is not very good.</p>

<p>If understanding a particular class requires a bunch of extra information that happens to be part of the business domain, we may still have a good abstraction. In that case, the extra information may be able to be amortized across several other classes.</p>

<p>Abstraction as information compression may be a useful concept for determining if any of your classes are actually <em>thingies</em>.</p>]]>
        
    </content>
</entry>
<entry>
    <title>The Modular Monolith</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/03/the_modular_monolith_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=178" title="The Modular Monolith" />
    <id>tag:anomaly.org,2008:/wade/blog//1.178</id>
    
    <published>2008-03-16T16:58:16Z</published>
    <updated>2008-03-16T17:42:03Z</updated>
    
    <summary>Continuing the line of thought from last time (Sharp Tools vs. Frameworks), another issue I see in quite a few frameworks and some systems is a code anti-pattern I&apos;ll call The Modular Monolith. We all know that modularity is a...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="Objects" />
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>Continuing the line of thought from last time (<a href="http://anomaly.org/wade/blog/2008/03/sharp_tools_vs_frameworks.html">Sharp Tools vs. Frameworks</a>), another issue I see in quite a few frameworks and some systems is a code anti-pattern I'll call <em>The Modular Monolith</em>.</p>

<p>We all know that modularity is a good thing to have in a system. Modular code, in general, reduces coupling between components, allows easier reuse, and simplifies understanding. When done correctly, each module can be analyzed, tested, and understood independently of most of the rest of the system. In object oriented programming, the smallest module we work with is the class. Usually a group of classes work together as a subsystem (or package). In other paradigms, the smallest module might be a library.</p>

<p>In any case, we are all pretty much familiar with the benefits and concepts of modularity. At the present time, it would probably be hard to find anyone that does not accept that modularity is a good design principle.</p>

<h2 class="subhead">The Monolith</h2>

<p>In some systems or frameworks, you may run into a problem using a single class. When you try to include the class, you find dependencies on other classes. In some cases, this is perfectly reasonable. If the class you are including requires some low-level utility classes to do its work, that's understandable. But sometimes, you find that the class depends on other classes at the same level of complexity. If some of those classes depend on other classes that depend on other classes, you can eventually reach the point of needing the entire framework (or system) to use any part of it.</p>

<p>In this case, we no longer have modular code, we have a <em>monolith</em>. The oxymoron <em>modular monolith</em> refers to the fact that the code is modular in the sense that there are modules. The problem is that the modules are so tightly coupled together that they might as well be a single monolithic stone. No piece can be used without bringing the whole structure along for the ride.</p>

<h2 class=:"subhead">Causes</h2>

<p>One mechanism that can cause this problem is the over-use of Singletons. The Singleton, by its nature, can cause hidden coupling between the classes that use the Singleton and the classes the Singleton uses. This is one reason why the <em>test-infected</em> are usually against the use of the Singleton pattern. A system with multiple Singletons can result in connections that are almost impossible to unravel.</p>

<p>Another cause of this increased coupling is low-level classes that depend on high-level classes. This violation of layers is almost guaranteed to generate blobs of classes that must always be used as a unit. Many systems cause this problem through trying to connect error-reporting to low-level code. As the error reporting becomes more advanced, it brings in subsystems unrelated to the purpose of the utility class.</p>

<h2 class="subhead">The One True System</h2>

<p>Frameworks aren't the only place you can see this anti-pattern. I've also run into this problem with systems written by people who have only worked in one system. The idea that code could be used outside the system never occurs to them. You can often recognize this situation with people who load up the multi-megabyte widget processing system as part of a piece of code to count the lines in a text file. Since they have always worked on this system, they treat it as the whole programming universe. All code must exist in the system.</p>

<p>The bad news is that these systems seem to be built on the modular monolith and the modular monolith further reinforces the attitude that everything must be done as part of the one system. This positive feedback makes stopping the behavior in either case almost impossible.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Sharp Tools vs. Frameworks</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/03/sharp_tools_vs_frameworks.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=177" title="Sharp Tools vs. Frameworks" />
    <id>tag:anomaly.org,2008:/wade/blog//1.177</id>
    
    <published>2008-03-09T17:27:18Z</published>
    <updated>2008-03-09T18:30:44Z</updated>
    
    <summary>Maybe it&apos;s a reflection of when I started programming, but I&apos;ve always had problems with frameworks. I prefer having a sharp set of tools to having a do everything framework. For me, the framework breaks down if I need to...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>Maybe it's a reflection of when I started programming, but I've always had problems with frameworks. I prefer having a sharp set of tools to having a do everything framework. For me, the framework breaks down if I need to do something that does not match <em>exactly</em> what the framework designers wanted to do.</p>

<p>A good friend of mine, Rick Hoselton, once described this to me as the <em>baloney slicer problem</em>. He pointed out that some tools were absolutely spectacular at one job (say, slicing baloney), but he preferred a really sharp knife. With a sharp knife, you can do a lot of things once you know how to use it safely. You can even slice baloney. The knife is more dangerous, but it's more useful in general.</p>

<p>While I agree that a sharp knife is more generally useful than the baloney slicer, there are times (like when I need to slice 100 pounds of baloney) when the baloney slicer is the right tool for the job. My biggest problem with frameworks and some libraries is the combination of the framework and the <a href="http://en.wikipedia.org/wiki/Golden_hammer">Golden Hammer</a> syndrome. This tends to cause the designers to adapt the framework outside the area it was designed for. Usually, it starts with a little change and moves on to some really monstrous results.</p>

<p>To stretch Rick's analogy a little further, I need to slice some ham. I am pointed to a feature of the framework that is kind of a combination cheese grater/wood chipper/jackhammer. (This is obviously the right piece of the framework because it cuts something, duh!) Now what comes out of this device is not really a slice, but that okay because we have some mostly edible processed food stuff that can be used to glue the pieces back into a slice-like mass. <em>Obviously</em>, this is much better than the knife, because it is part of the framework and it can reduce 100 hams into something resembling slices in a few minutes.</p>

<p>Obviously, I'm exaggerating to get the point across. However, I am surprised how often I run into this problem when using frameworks or extensive libraries. Sometimes, we're told that the <em>framework is how we do things around here</em>. Or, maybe we just don't want to <em>re-invent the wheel</em>. I suspect part of the issue is that the people who built the framework are just trying to show that their work is useful.</p>

<p>Unfortunately, they have lost track of the simple fact that a tool optimized for solving one problem will not be as effective for solving others. This is not a condemnation of the tool, its just a fact of life. As you optimize a tool, you automatically limit its scope. After all, optimal tools have a narrow focus, that's what makes them efficient. When you try to use an optimized tool for something it wasn't designed for, you lose all its advantages.</p>

<p>While frameworks have their place, they are not the solution to every problem. Don't forget that small, sharp tools are very important as well.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Flexibility and Rigidity in Software</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/03/flexibility_and_rigidity_in_so_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=176" title="Flexibility and Rigidity in Software" />
    <id>tag:anomaly.org,2008:/wade/blog//1.176</id>
    
    <published>2008-03-07T01:14:29Z</published>
    <updated>2008-03-07T04:30:16Z</updated>
    
    <summary>One of the dimensions in which you can vary a design is in terms of the amount of flexibility it supports. Trade-offs along this dimension show up in many design decisions for various kinds of software. In some cases, software...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>One of the dimensions in which you can vary a design is in terms of the amount of flexibility it supports. Trade-offs along this dimension show up in many design decisions for various kinds of software. In some cases, software is made less flexible to improve predictability. In other cases, the adaptability of the software is paramount, so the design tends more toward flexibility. I believe that most programmers are comfortable with making these kinds of design trade-offs.</p>

<h2 class="subhead">Programming Languages</h2>

<p>There is one area of programming where this spectrum of flexibility vs. rigidity becomes more of a religious issue. When it comes to programming languages, programmers seem to divide them into extremes of order and chaos. Fans of statically typed languages look at dynamical languages as chaotic, undisciplined messes. Dynamic programming language advocates look at statically-typed languages as inflexible and overly verbose. (This actually predates the dynamic/compiled debate, I remember when us C programmers called Pascal a <em>bondage and discipline language</em> because of its rigidity.)</p>

<p>It is not quite this cut-and-dried, Java is considered more rigid or orderly than C++, even though both are considered statically typed languages. Among dynamic languages, Ruby is considered more flexible than Python. Different people will view different features as being <em>too flexible</em> or <em>overly rigidity</em> where others will find those same features to be perfectly reasonable.</p>

<h2 class="subhead">Flexibility Means Adaptability</h2>

<p>Even if you agree with me so far, I expect to tick off a lot of people with my next argument. I believe that very flexible languages are better than more structured languages when it comes to adapting to new programming paradigms and techniques.</p>

<p>Recently, Bruce Eckel wondered about <a href="http://www.artima.com/weblogs/viewpost.jsp?thread=221903">Java: Evolutionary Dead End</a>? Paraphrased, his argument is basically that new features being added to Java appear awkward and ugly because the changes have not set well with the design of Java.</p>

<p>Ten years ago, many of the proponents of Java pushed the clean, orthogonal design of the language. This mostly clean design has not handled change very well, possibly because there is not enough flexibility to handle modifications. Any changes to the language look <em>bolted on</em>. These features often don't look or work quite right because the original design had no room to accommodate this kind of change.</p>

<p>The C language, on the other hand, was very simple with sometimes multiple ways to do the same thing. The syntax was a bit more flexible with <em>escape hatches</em> in the system to allow programmers to subvert the system to get low-level work done. This slightly messy design was relatively easily modified to support first object oriented programming and later generic programming. Although many people complain about C++, adding functional programming features with a little template magic was relatively easy compared to what the Java community is currently going through.</p>

<h2 class="subhead">Even More Flexibility</h2>

<p>Some other languages have handled massive changes with even more adaptability. (If I haven't offended you yet, get ready.&lt;grin/&gt;)</p>

<p>Perl added object oriented features with some relatively small syntax changes when moving from Perl 4 to Perl 5. Although many people would suggest that Perl syntax is the ultimate in chaos, this flexibility allowed powerful OO functionality without complete replacement of the language. Moreover, Perl's adaptability has allowed Perl programmers to explore different approaches to class design and try out strange features like the ability to modify a class that was created by someone else. (Yes, Perl already had that feature. Ruby did not invent it.) We have had people experimenting with different OO syntaxes and features over the last ten years and some of the results are going into Perl 6 as the only standard way to build classes.</p>

<p>At the same time, Larry added a few more features that made closures quite simple in Perl. Real lexical variable and code references were the main changes. Almost ten years later, Java is having trouble with this concept because the <em>everything is an object</em> approach and the <em>closure/anonymous function</em> approach have a relatively large impedance mismatch. This is not to say that Java is wrong, just that the design doesn't accommodate this kind of change very well.</p>

<p>Ruby and Python also used flexibility in their language design to explore new features. However, neither of these languages has quite the unbridled flexibility of Perl. Although many people consider that a bad thing, they miss a very important point. This was an explicit design <em>feature</em> of Perl.</p>

<h2 class="subhead">Turn it to 11</h2>

<p>But, there are languages that are more flexible even than Perl. I spent a fair portion of my career working in the Forth language. Two of the most interesting things about Forth is it's lack of syntax and its ability to run arbitrary code at compile time. This allows a good Forth programmer to tailor the syntax of the language to the problem at hand. It also allows things that would be practically impossible in another language.</p>

<p>For example, Rick Hoselton and I added a relatively powerful and efficient exception handling mechanism into the Forth we were using at the time in a couple of days. At another point in time, some of us implemented a prototype-based object model in less than a week. We also added RAII-type resource management at another time. Because the language was so malleable, ordinary programmers could make these design changes without completely redesign the language.</p>

<p>The downside of this flexibility is also obvious. A bad (or overly clever) programmer can generate code that no one will ever be able to figure out. However, good programmers can extend the language in useful ways, exploring language features and changes in paradigms without needing large standards organizations or completely new languages.</p>

<h2 class="subhead">Back to Mainstream</h2>

<p>If I haven't lost you already, I should reassure you that I am not advocating that every language should be as free and adaptable as Forth or Perl.</p>

<p>But, I am suggesting that we look at our languages a bit differently. As we've seen with Java, tool support and standardized libraries can be much easier with more rigid languages. These languages also support the ability to do very effective work with groups having a range of levels of experience. (The leader of one project I worked on decided we would work in Java, not because of any technical superiority of the Java language, but because you <em>can hire Java programmers straight out of school</em>.) Many types of applications are very easy to build in Java because of the strong frameworks supporting those kinds of applications.</p>

<p>Many Python proponents argue that their approach of <em>only one way to do it</em> makes it easier for new people to learn the language. There are fewer nooks and crannies to get lost in. But Python is more flexible than Java. A few years ago, Guido made some backward-incompatible changes to enhance the object model based on lessons learned as people pushed the edges of that part of the design. Although it took a major change to the language to really improve the object model, it was people pushing the edges of what could be done that helped design the change.</p>

<p>Perl advocates point to the flexibility of the language and lack of rigid rules as some of the features that help us <em>get work done</em>. The <acronym title="There's more than one way to do it">TMTOWTDI</acronym> slogan is more than a platitude. It is a recognition that the adaptability of the language gives us the ability to do both amazing and horrible things in our code.</p>

<p>What people should try to realize is that these different levels of flexibility are not good or bad, they are different. Usually, languages steal from one another when they are first created, the more flexible languages keep stealing ideas and adapting to new situations. Maybe the more orderly languages could benefit from exploring new techniques in a more flexible language before committing to another <em>one true solution</em> to the problem.</p>]]>
        
    </content>
</entry>
<entry>
    <title>A Different Look at Years of Experience</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2008/02/a_different_look_at_years_of_e.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=175" title="A Different Look at Years of Experience" />
    <id>tag:anomaly.org,2008:/wade/blog//1.175</id>
    
    <published>2008-02-09T17:16:56Z</published>
    <updated>2008-02-09T18:05:45Z</updated>
    
    <summary>I&apos;ve been spending a fair amount of my time the past few years helping hire programmers. I have also spent time training both entry-level and senior programmers. So, Jeff Atwood&apos;s recent blog entry The Years of Experience Myth struck me...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>I've been spending a fair amount of my time the past few years helping hire programmers. I have also spent time training both entry-level and senior programmers. So, Jeff Atwood's recent blog entry <a href="http://www.codinghorror.com/blog/archives/001054.html">The Years of Experience Myth</a> struck me as particularly interesting.</p>

<p>While I agree that the number of years of experience a developer has is mostly uncorrelated with their programming ability, I'm not sure I would go as far as Jeff did in suggesting that anything more than a year of experience is not worth considering. I think years of experience do count for something, we just can't use that measure in hiring decisions. As a very wise mentor of mine used to be fond of saying</p>

<blockquote>There's a big difference between five years of experience and one year of experience five times.</blockquote>

<p>There is no way to tell from a resume whether or not a person is in the first category or the second. Some people manage to get a year's experience under their belt and stop learning. They can continue to work on the same kinds of projects, applying the same kinds of techniques, without ever learning anything new. On the other hand, some people never stop learning. These are the ones that other programmers go to when they have a sticky problem that is holding up a project, or to help on that impossible bug.</p>

<h2 class="subhead">Understanding vs. Mastery</h2>

<p>One point where I specifically diverge from Jeff's post is in the suggestion that once a programmer has 6 months to a year of experience in a technology, they either get it or they don't. A specific example that does not match this is object oriented programming. Although it has become the dominant programming paradigm, it does take some work to really understand it. But, that's a topic for another day.</p>

<p>In every language or technology I've used in almost two decades of programming, there was a difference between learning to use the basic tools and actual mastery. Mastery takes time. Unfortunately, we cannot tell how well someone has mastered a topic by the number of years they have spent doing it.</p>

<h2 class="subhead">Hiring</h2>

<p>This leads back to Jeff's post. I agree that there is a problem with hiring practices that depend on the number of years that you have spent working on a particular technology. In the worst case, I have seen job postings that are looking for exactly the same experience as the person they are replacing. (You've seen these. They have 4 or 5 <em>n years experience using technology Y</em> entries. Some of the technologies are only used by two companies on the planet. etc.)</p>

<p>I'm not willing to write off years of experience. But, I do tend to use it in a different way. If I see someone whose resume states 5 years of experience in C++ or Java and after a phone screen, it appears that her understanding is closer to 1 year, I am quite skeptical about whether or not they will be able to learn. If someone claims 2 years of experience and has about that level of understanding from the phone screen, he is probably a good candidate. A few times, I have run across someone with a year of experience on their resume, but with a real understanding of the topic. We want this person, she is a motivated learner.</p>

<h2 class="subhead">Job Postings</h2>

<p>I did, however, like Jeff's point that a job posting that focused on a laundry list of years of experience in particular technologies was an indicator of a position I might not want to work. That should be taken with a small grain of salt, since the people doing the work may not be the same as the ones writing the job posting.</p>

<p>All in all, it was a thought-provoking post (at least to me).</p>]]>
        
    </content>
</entry>
<entry>
    <title>Some Thoughts About Generating Random Numbers</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2007/12/some_thoughts_about_generating.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=174" title="Some Thoughts About Generating Random Numbers" />
    <id>tag:anomaly.org,2007:/wade/blog//1.174</id>
    
    <published>2007-12-17T01:41:09Z</published>
    <updated>2008-08-05T04:44:49Z</updated>
    
    <summary>In my last essay, I mentioned the Shuffling essay from Jeff Atwood&apos;s blog. This time, my essay is a little less philosophical and more technical. Several of the comments on the Shuffling essay talked about the importance of knowing the...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
            <category term="CompSci General" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>In my last essay, I mentioned the <a href="http://www.codinghorror.com/blog/archives/001008.html">Shuffling</a> essay from Jeff Atwood's blog. This time, my essay is a little less philosophical and more technical. Several of the comments on the Shuffling essay talked about the importance of knowing the random number generator you are using.</p>

<p>Generating random numbers sequences is a particularly interesting subject for me. Several of the comments about generating random numbers made in that essay covered both good and bad information about generating random sequences. Because of this, I decided to explain some of what I know about the subject in the hopes of giving a little more to think about.</p>

<h2 class="subhead">Random Numbers</h2>

<p>Most people don't think about the fact that the term <em>random number</em> is relatively useless. Is the number 17 random? How about 7183? How about 1? Depending on how a number was generated and in the context in which it was generated, we could say either yes or no.</p>

<p>A better question would be to ask what someone is looking for when they ask for a random number. Sometimes when someone says they need a random number, what they really want is a number that's not related to anything in particular in their system. They probably also want it to be different each time they ask for one. If that is all they need, almost any method of picking numbers is reasonable.</p>

<p>More often, people are looking for a random sequence of numbers.</p>

<h2 class="subhead">Random Number Sequences</h2>

<p>What do we mean by a <em>random number sequence</em>?</p>

<p>The main attribute that makes a number sequence <em>random</em> is that it is unpredictable. Most of the other attributes of random number sequences boil down to measurements of the difficulty of guessing the next number in the sequence. Some of these measures include periodicity, correlation between values, and various kinds of spectral analysis. Although random sequences are unpredictable, the fact that a sequence is unpredictable (by any given person) does not mean that it is random.</p>

<p>In many cases, a sequence does not need to actually be random. It just needs to be sufficiently random for the purpose. Different problems have different requirements for random sequences. If I need a single number, even a low-quality sequence will do. For a quick game that needs 100 random numbers, we would need a higher-quality sequence. For a Monte Carlo simulation needing 1000s of random numbers, we need a high-quality random sequence. For cryptographic applications, only the highest-quality random sequences are acceptable.</p>

<p>How do we measure the quality of a random sequence?</p>

<h2 class="subhead">A Measure of Randomness</h2>

<p>Information theory measures the information in a process in terms of entropy. A particular sequence has low entropy if not much information is encoded in it. A sequence with high entropy contains a lot of information. In the case of a random number sequence, the harder it is to guess the next number, the higher the entropy.</p>

<p>This gives us a way to talk about the quality of a random number sequence. A high-entropy sequence is higher quality than a low-entropy sequence.</p>

<p>The problem is that entropy is hard to measure. It is also difficult to harvest entropy from high-quality sources. Most good sources of entropy are difficult to access from a computer without sacrificing some of the entropy. The time between two nuclei in a radioactive isotope decaying is very unpredictable, but the rate at which we can measure the time is not.</p>

<p>Trying to measure this time as a source of random numbers either wastes entropy by not being precise enough or is very expensive. There is also a maximum rate at which we can extract individual numbers in the sequence. The same holds true for Johnson Noise, Brownian motion, or any other random physical process.</p>

<h2 class="subhead">Uncorrelated Numbers</h2>

<p>Except in very specialized circumstances, people normally don't need random sequences, they need uncorrelated sequences. Back in 1947, the RAND corporation was doing a lot of work using Monte Carlo simulations. When doing simulations, you need sequences of numbers with no correlation between the numbers. However, truly random numbers are not as useful in a simulation, because there is no way to repeat a particular simulation run. They solved this problem by creating a list of one million random digits that they could reference for their simulations.</p>

<p>The RAND corporation sold this table in published form for people to use in experiments. Once the table was generated, these numbers were no longer random. Anyone recognizing the part of the sequence could predict the next number by looking it up in the table. You could do the same thing with the digits of <em>&pi;</em>, <em>e</em>, or other irrational numbers. (After convincing yourself that the correlation between digits is negligible for your purposes.)</p>

<p>Despite the fact that the RAND table was generated as a high-quality random sequence, it is no longer a random sequence because it has been generated before (and published).</p>

<h2 class="subhead">Pseudo-Random Number Generators</h2>

<p>The difficulty with using real randomness in a computer lead to the creation of pseudo-random number generators (PRNGs). If we can algorithmically generate an unpredictable sequence of numbers, we get most of the benefits of a random sequence without the downsides. Any algorithmic method of generating a sequence is predictable given enough information.</p>

<p>Algorithmic generators started relatively simple and have become more sophisticated over the years. Interestingly, all PRNGs contain a certain amount of state. This state is a very good representation of the entropy of the generator. Early PRNGs had a single integer as internal state. Current state-of-the-art generators contain much more.</p>

<h2 class="subhead">Real Random Generators</h2>

<p>The best class of random generators today harvest small amounts of entropy from many sources. These generators extract random bits from a pool of entropy to make random numbers. Unfortunately, as we extract bits, we use up the entropy. So these generators continue to harvest entropy almost continuously. Unfortunately, we still have problems with the speed at which we can acquire new entropy and the storage for the entropy we harvest. It turns out that attempting to extract entropy from really random sources loses some of that entropy in the measurement process.</p>

<p>Most systems don't store all of the entropy they harvest. Instead they use a <em>stirring function</em> to continually merge the new entropy into a pool. At any given point in time, the pool contains a bit less entropy than we have put into it. These stirring functions are carefully designed to mix the new data into the pool in such a way that we don't generate regular patterns.</p>

<p>This approach is much more sophisticated than the simple unconnected sound card or lava-lamp type generators of a few years ago. In addition to being a good source of random sequences, there is solid research behind the amount of entropy that can be collected and used this way.</p>

<p>Unfortunately, this approach still has a downside. There is a limit to how fast we can extract random numbers from the pool without depleting its entropy. Different systems may either give lower quality numbers in the sequence for a while or pause until enough entropy has been collected.</p>

<h2 class="subhead">Concluding Thoughts</h2>

<p>The universe contains many sources of randomness. This randomness is difficult to harvest for use in a computer program. Randomness is a more complicated concept than most people realize. As Donald Knuth pointed out in <cite>Seminumeric Algorithms</cite> (The Art of Computer Programming, Vol. 2), you should never randomly pick a method for generating random numbers.</p>]]>
        
    </content>
</entry>
<entry>
    <title>The Show Me Response</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2007/12/the_show_me_response.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=173" title="The &lt;em&gt;Show Me&lt;/em&gt; Response" />
    <id>tag:anomaly.org,2007:/wade/blog//1.173</id>
    
    <published>2007-12-10T01:01:51Z</published>
    <updated>2008-08-05T04:44:23Z</updated>
    
    <summary>This week, there was an interesting post on Jeff Atwood&apos;s Coding Horror blog. This essay was on Shuffling. Shuffling is an application of random numbers, which is a particular interest of mine. I will write more on that subject in...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
            <category term="CompSci General" />
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>This week, there was an interesting post on Jeff Atwood's <a href="">Coding Horror</a> blog. This essay was on <a href="http://www.codinghorror.com/blog/archives/001008.html">Shuffling</a>. Shuffling is an application of random numbers, which is a particular interest of mine. I will write more on that subject in an upcoming essay.</p>

<p>One of the comments caught my eye because it echoed a sentiment I have fought several times in my career. The commenter referred to a particular simple solution to the problem and said</p>

<blockquote>Random removal from a mutable array is the same approach I used last time, and the same approach I'll use next time, unless someone shows me why it's wrong.</blockquote>

<p>This answer really bothered me. It wasn't particularly about the solution the person chose. I also would prefer not to abuse a particular person. After all, depending on the domain, it might be a reasonable answer. My problem is with the assumption that it is someone else's job to prove that the simple (naive) solution is not the best one.</p>

<p>I often run into this problem with code written by <em>domain experts</em>. The idea that the naive solution they created in 30 seconds of thought is probably better than the one created by a computer science expert after a great deal of research is baffling to me. The fact that the documented solution is easy to implement and supplied by many libraries makes this comment even more amazing. Just because the computer scientist doesn't know about your domain is no reason to discount their knowledge of the computer science domain.</p>

<p>As a professional programmer, I feel that improving my knowledge of algorithms is part of the craft of writing code. When looking at a new problem, I often try to check my books and online references for better solutions, especially if the problem seems particularly general. After all, many of the general problems in computer science have been studied for 30 or 40 years. Many of the people doing that research are extremely bright. They were also probably pretty knowledgeable about the area of their own research. It is relatively safe to assume that they knew more about the subject than I do.</p>

<p>Granted, our field often involves trade-offs. Sometimes the well-studied general algorithm is not the best solution to the particular problem you are working on. But, without understanding the general algorithm you can't know that. I have often heard comments along the line of <em>our problem is much more complicated than this <em>toy</em> research problem, so we'll use our solution</em>. (As an interesting footnote, almost every place I've ever worked in about two decades of software development was convinced that they are solving harder problems than everyone else.)</p>

<p>If the person making the comment has carefully considered the algorithm in question and can point to specific issues with the general algorithm, that is one thing. We all make design trade-offs all the time. However, if you don't bother to look at the general algorithm and assume that your approach is good enough, you are missing an opportunity to learn and you are potentially generating worse software in the bargain.</p>

<p>I have repeatedly worked on code where someone is implemented a horrible naive algorithm without bothering to look at the implications. For example, a recent performance problem was caused by an <code>O(n<sup>2</sup>)</code> algorithm that was easily converted into an <code>O(n)</code> algorithm with a small amount of thought. Unfortunately, the original programmer could not be bothered to look for a better approach. After all, in the small test sets he had checked the naive solution was fine.</p>

<p>I don't think it is unreasonable to expect a professional programmer to improve his or her craft by studying classical solutions to known problems. I also feel that, as <em>professionals</em>, we should welcome the opportunity to improve our understanding and our tools when a problem we are studying turns out to have a standard, well-researched solution. Understanding the solution improves our ability to analyze algorithms in the future. Seeing how the experts build and analyze algorithms will help you improve your ability to do the same.</p>

<p>Deciding that it is someone else's job to prove that the standard solution is better than your quick thought is handing the job of improving your skills to someone else. As a professional programmer, I realize that knowledge is what allows me to do my job. Seeking knowledge and understanding is, therefore, the most important thing I can do to improve my skills as a programmer.</p>

<p>Although I have been called arrogant many times, even I am not so arrogant as to believe that I am smarter and know more than <em>all</em> of the people working in software over the last 40 years.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Debugging Without a Debugger, Part II</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2007/12/debugging_without_a_debugger_p.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=172" title="Debugging Without a Debugger, Part II" />
    <id>tag:anomaly.org,2007:/wade/blog//1.172</id>
    
    <published>2007-12-01T21:19:17Z</published>
    <updated>2008-08-05T04:43:53Z</updated>
    
    <summary>Last week, I wrote about a technique for debugging without using a debugger in Debugging Without a Debugger. I talked a bit about the advantages of instrumenting code and how it can be used to supplement the use of a...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
            <category term="Tools" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>Last week, I wrote about a technique for debugging without using a debugger in <a href="http://anomaly.org/wade/blog/2007/11/debugging_without_a_debugger_1.html">Debugging Without a Debugger</a>. I talked a bit about the advantages of instrumenting code and how it can be used to supplement the use of a debugger.</p>

<p>Those of you who have always used a debugger might consider this vaguely interesting, but not particularly critical. I first learned this technique in an environment where we could not use a debugger even if we had wanted to. Several times in my career I have have been in situations where a debugger was not available and this technique was the only way to debug code.</p>

<h2 class="subhead">Background Processes</h2>

<p>The first debugger-intolerant environment I developed in was writing a <acronym title="Terminate and Stay Resident">TSR</acronym> program under DOS that communicated with a foreground program written by another company. The equivalent type of program would be daemon processes under various flavors of Unix or Windows services.</p>

<p>At the time, debuggers did not support the ability to attach to a running process. Since we couldn't start the process under the debugger, there was no way to run the program inside the debugger. Additionally, the program we were working on was communicating with a device that was not under our control. Stopping the program in the debugger would not have been an option even if we could have run under one. Which leads us to the next class of programs that do not work well with debuggers.</p>

<h2 class="subhead">Real-Time Code</h2>

<p>If the program has real-time requirements, running under a debugger may not be an option. Real-time does not always mean that it has to be lightning fast, but almost all real-time systems have requirements on the amount of time they are allowed to let certain operations wait. Stepping through code in a debugger or stopping on a breakpoint will almost certainly violate these requirements.</p>

<p>If the only real-time requirement is user responsiveness, this is not a real problem. But, if the program is communicating with another program or external device, pausing in the debugger might cause the whole application to fail or behave mysteriously. The TSR I talked about above communicated with another computer over a special interface card. If the card wasn't serviced in a timely fashion messages would be lost and the application would fail.</p>

<p>If the program has really tight, hard real-time requirements, even printing to a log might be too disruptive. In those cases, I've seen systems that log to a buffer in memory that is written to disk when the system has time.</p>

<h2 class="subhead">Multi-Threaded Code</h2>

<p>Although there are a few debuggers on the market that deal with multi-threaded code, this kind of system plays havoc with debuggers. First of all, there is the question of what happens when a breakpoint is hit. Do we stop all threads or just the one? If we allow the other threads to continue, what happens if a second thread hits a breakpoint while we are looking at the breakpoint on the first thread?</p>

<p>If that isn't confusing enough, think about the changes to the timing of the interactions between thread. Race conditions may appear and disappear at random because of the interactions we are having with one or more threads. How does access to a shared object work when a thread changes the object we are inspecting in the debugger?</p>

<h2 class="subhead">Server Code</h2>

<p>The next kind of system that I worked on without debugger help was a server in an on-line system. Like many on-line systems, this one had multiple threads to deal with incoming requests. There was a real-time component in the time required to service the request and respond to the client. If that weren't enough, the servers needed to stay running pretty close to 24/7. We could rotate servers into and out of service to load new code and fix problems, but they tended to run for hours, days, or weeks at a time.</p>

<p>A debugger is practically useless in this scenario. How do you watch a breakpoint that is only hit once every few hours? How do you catch problems that only occur on certain kinds of requests when you aren't sure which request triggers the problem?</p>

<h2 class="subhead">Instrumentation</h2>

<p>In each of these scenarios, we found that by carefully instrumenting the code we were able to troubleshoot and solve problems despite the lack of a visual debugging environment. In some cases, we logged lots of information in the hopes of spotting the problem in the reams of collected data. In other cases, we put very specific instrumentation in place to catch the rare times when the problem occurred.</p>

<p>One benefit of this approach is the ability to bring the full power of your programming language to bear on recognizing a problem and logging the appropriate information. If we knew that the problem was related to a certain area of memory becoming corrupted, it was possible to make a function that tested that area of memory. Now, we can call the test at various points in the code and log when the error was detected. Most debuggers today support some form of conditional breakpoint. In most cases, though they support only counting or simple conditional expressions. If your debugger supports a condition based on a function call, the debugger can match this feature.</p>

<p>You can also easily write a test that saves earlier state of the program to compare with the current state to see when things change. For example, you might only want to log in the destructor of the ABC object that was created by function abc(), not the other hundred or so ABC objects in the system. If this information is not already in a variable, most debuggers could not track this change. Most debuggers support some method of testing if a small number of simple variables change.</p>

<p>With the ability to write arbitrarily complex tests and the ability to log anything that you can access from the code. Instrumenting the code is a very powerful technique.</p>

<h2 class="subhead">Wrap up</h2>

<p>Between this article and the last, I hope I've given you some reasons to consider troubleshooting without a debugger. Maybe the next time you find yourself bouncing on the <em>step</em> or <em>next</em> command in your debugger, you might consider a more automated way to troubleshoot.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Debugging Without a Debugger</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2007/11/debugging_without_a_debugger_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=171" title="Debugging Without a Debugger" />
    <id>tag:anomaly.org,2007:/wade/blog//1.171</id>
    
    <published>2007-11-25T16:57:06Z</published>
    <updated>2008-08-05T04:43:32Z</updated>
    
    <summary>I&apos;ve noticed something about the programmers I have dealt with in the last few years. Many of them seem to equate debugging skill with ability to use a debugger. In fact, in some instances, the concept of being able to...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
            <category term="Tools" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>I've noticed something about the programmers I have dealt with in the last few years. Many of them seem to equate debugging skill with ability to use a debugger. In fact, in some instances, the concept of being able to troubleshoot a problem outside a debugger is so foreign it would never occur to them.</p>

<p>A debugger is very helpful for many troubleshooting tasks. If there is a logical error in a localized area of code, a debugger can help you quickly explore the logic and find the problem. This sharp focus is the most important feature of the debugger.  However, if you don't have any idea where the problem is located, this narrow focus is more of a hindrance than a help.</p>

<p>Most people who only have debugger-based troubleshooting experience end up scattering breakpoints throughout the code hoping that one of the breakpoints will get them close. Unfortunately, if the problem is based on the relationships between different portions of the code, the narrow focus may hide the actual problem. Much of the problem is that some defects are related to the sequence in which different pieces of code or called or relationships between multiple pieces of code.</p>

<p>In this case, the debugger only gives part of the story. You need to track these relationships separate from the debugger session. The debugger's narrow focus and the need to track relationships and sequencing separately makes this kind of troubleshooting difficult. This is not a problem with debuggers, it is the result of using the wrong tool for the job.</p>

<h2 class="subhead">Instrumenting code</h2>

<p>A completely different approach is to instrument the code with logging statements. Because of the nature of logging, the sequencing information is explicitly tracked in the log. Proper choice of information to write to the log can help tracking the relationships as well. Instrumenting the code does not provide as easy a method of focusing in on more localized problems, but it is much better at troubleshooting non-localized problems.</p>

<p>Either technique can be used on many problems. Some problems are easer to solve with one technique or the other. A few problems are easiest to solve by combining the techniques. You can use instrumentation to find the general shape of the problem. Instrumentation may help to discover which methods are being called in which order or which methods are called more often than others. This approach is really helpful in trying to find when code is not called.</p>

<h2 class="subhead">Summarizing Your Output</h2>

<p>Since the kinds of problems that work best with instrumenting are problems with relationships between calls, just looking at the output is not always enough to find the problems. Sometimes you need to summarize the data in some way. Maybe you need to count calls to particular routines, or verify that every call to method A has a corresponding call to method B. These kinds of relationships are often easier to see after the data has been re-ordered in some way.</p>

<p>You can use tools like sort and uniq to do simple reorganization of the data to look for patterns. Sometimes you will need more powerful tools like AWK or Perl to extract relationships from the code. If you format the output of your logging statements appropriately, you can even use a spreadsheet program like Excel to re-organize the output to provide better understanding.</p>

<h2 class="subhead">Different Viewpoints</h2>

<p>Instrumenting code provides a different kind of information than you normally get from a debugger. This technique is very useful for dealing with problems that require seeing relationships between multiple different portions of the code. Another place where instrumenting can be more useful than using a debugger is when investigating long loops. If a loop runs a dozen times, setting a breakpoint and inspecting the code on each pass can be useful. If the loop runs half a million times, the breakpoint is basically useless.</p>

<p>The main use for instrumenting code is looking for getting an overview of the code. Using a debugger gives a highly focused way to inspect the code. However, instrumenting is a better tool for getting a broader view of the code. In some cases, once you have digested this broader view, you may find a particular piece of code that needs more focused attention. Switching back to the debugger can be very effective at this point.</p>

<p>Once you are comfortable with both techniques, you will often find yourself switching back and forth between the two techniques. You might use some instrumenting to test an idea of why a problem is occurring and then switch to the debugger to look more carefully at a method that has attracted your attention. After doing some focused examination with the debugger, you might decide that another area might be more fruitful. Then, you instrument a different piece of the code to explore another idea. In some situations, the two techniques complement each other.</p>

<h2 class="subhead">The Downsides of Instrumenting</h2>

<p>One of the main problems with instrumenting code is the need to change the code to add the statements needed to log information. This requires some recompilation and is not as quick as adding and removing breakpoints. Because of the recompilation cost, some people ignore this technique.</p>

<p>Obviously, avoiding a useful technique because it has a cost is not reasonable. Many of the decisions we make in software development are about trade-offs. You should be able to evaluate your debugging tools in terms of costs and benefits, as well. Obviously, you wouldn't use an expensive technique for a trivial problem. But, if the problem is complicated enough, the cost is less than the benefits.</p>

<p>Another way to reduce the cost of recompiling is to try to add as much logging as possible to avoid recompiling again. Printing out too much information with each logging statement or adding lots of instrumentation <em>just in case</em> may reduce the amount of time spent recompiling, but it increases the amount of time you spend summarizing and analyzing the output. It's always important to remember that the output from the instrumentation and the summarizing code are not the goal, they are just tools used to find an actual problem.</p>

<p>Instrumenting the code in a way that tests one or a couple of ideas is much better than generating so much output that you will spend a day wading through the logs looking for something important. Eventually, you reach the point where you are spending more time looking at the logs, than you would have running another compile.</p>

<p>One final downside of instrumenting code is the risk of accidentally leaving the logging code in place after you have found the problem. Your version control system is your friend at this point. Always check the changes you are going to commit to verify that you are only adding actual fixes and not debugging code.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Review of Software Configuration Management Patterns</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2007/11/review_of_software_configurati.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=170" title="Review of &lt;cite&gt;Software Configuration Management Patterns&lt;/cite&gt;" />
    <id>tag:anomaly.org,2007:/wade/blog//1.170</id>
    
    <published>2007-11-13T04:09:34Z</published>
    <updated>2008-08-05T04:43:04Z</updated>
    
    <summary>Software Configuration Management Patterns Stephen P. Berczuk and Brad Appleton Addison-Wesley, 2003 The first three chapters define the problem space. We get a solid description of Software Configuration Management (SCM) and an introduction to patterns and pattern languages. This section...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="Books" />
            <category term="CodeCraft" />
            <category term="Tools" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p><cite>Software Configuration Management Patterns</cite><br />
Stephen P. Berczuk and Brad Appleton<br />
Addison-Wesley, 2003</p>

<p>The first three chapters define the problem space. We get a solid description of Software Configuration Management (SCM) and an introduction to patterns and pattern languages. This section of the book sets up the context that you will need to understand the rest.</p>

<p>Much like the <acronym title="Gang of Four">GOF</acronym> book, this book gives names to different practices that you may now be using. It explains the each of these practices as patterns. More importantly, this book relates these patterns to one another as a <em>pattern language</em> that gives more of a big picture understanding of SCM. In other words, the book not only presents patterns such as <em>Mainline</em>, <em>Integration Build</em>, and <em>Release Line</em>; it also explains how these and other patterns relate to each other to make a strong SCM policy.</p>

<p>I have been using various version control systems for nearly two decades. In that time, I have stumbled my way toward understanding many of these patterns. If you have worked in software for a long time, you might feel that you already know everything you need. One of the things I found most useful in this book, (besides the standardized naming) was justification for some of the practices I had come to accept. The way the book related different practices to make the combination stronger was also quite revealing.</p>

<p>If you have not been doing SCM for long or have just begun using some version control tool, this book can give you insight into what you should be doing. Unfortunately, I suspect that some experience is needed to properly appreciate the patterns in the book. If you already know everything you need to about SCM, the book still provides standardized names and relationships that can help when explaining your practices to others.</p>

<p>Overall, I recommend this book for anyone working in software development. While it is not the most exciting topic to read, it is practical and useful to working developers and their support teams.</p>]]>
        
    </content>
</entry>
<entry>
    <title>The IP Goose, revisited.</title>
    <link rel="alternate" type="text/html" href="http://anomaly.org/wade/blog/2007/11/the_ip_goose_revisited.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://anomaly.org/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=169" title="The IP Goose, revisited." />
    <id>tag:anomaly.org,2007:/wade/blog//1.169</id>
    
    <published>2007-11-10T07:36:59Z</published>
    <updated>2008-08-05T04:41:26Z</updated>
    
    <summary>A couple of years ago, I wrote in The IP Goose about some effects that too strict an IP policy can have on developers. In the intervening two years, I&apos;ve had some other insights that I believe are important. First...</summary>
    <author>
        <name>GWade</name>
        
    </author>
            <category term="CodeCraft" />
            <category term="Computer - Business" />
            <category term="Programming Philosophy" />
    
    <content type="html" xml:lang="en" xml:base="http://anomaly.org/wade/blog/">
        <![CDATA[<p>A couple of years ago, I wrote in <a href="http://anomaly.org/wade/blog/2005/12/the_ip_goose.html">The IP Goose</a> about some effects that too strict an IP policy can have on developers. In the intervening two years, I've had some other insights that I believe are important.</p>

<p>First of all, I stand by my original essay. Companies should be able to protect the software they have hired people to write. They should have some protection against an individual taking the information they have learned to take business away from them. A carefully crafted IP agreement can do that.</p>

<p>In the previous essay, I also described how practice is needed to improve skills and increase knowledge. I still believe that development outside of work is needed to practice, learn new things, and keep unused skills from fading. Unfortunately, I've also come to believe that the effects of a draconian IP policy can be more damaging than I first thought.</p>

<h2 class="subhead">Habit</h2>

<p>One important part of practice is consistency. As long as you continue to practice anything, you tend to improve or, at least, maintain your skills. Obviously, if you stop practicing, the skills and knowledge begin to degrade. More importantly, you also start to lose the habit of practice. The longer you are not practicing, the more effort is needed to get back into the habit of practicing and the easier it is to backslide. This tends to make regaining the skill you are practicing even harder.</p>

<p>I'm sure many of you have seen this effect in different fields: martial arts, sports, music, cooking, etc.</p>

<p>Well, the time off from practice caused by a draconian IP agreement will have the same effect on your programming as an enforced absence from skiing or judo would have on your performance in those areas. Subtly, priorities have shifted. There's always something else that needs to be done. You haven't had time to practice for <code>x</code> months, what's another couple of days? It takes quite a while to overcome that inertia, and that whole time your skills continue to degrade.</p>

<h2 class="subhead">Long Term Effects</h2>

<p>I think that an overly strict IP agreement not only has a detrimental effect on our skills while you work under it, but also a long-term degradation in your ability to practice.</p>

<p>I worked for a while under a really strict IP agreement. Following the letter of the agreement meant that any software I touched would belong to the company. Working on Open Source projects was obviously out of the question. If I wanted to work on a project on my own, I had to contact the appropriate person in the company's legal department and obtain permission, in advance. If necessary, I had to get sign-off from my supervisor that the project was in no way related to my work with the company. Needless to say, quickie little projects to try out a new technology or technique were no longer worth the effort.</p>

<p>It has taken quite some time to regain the habit of working on projects on my own time. I was able to get back to the little stuff pretty quickly, but bigger project require a lot more effort to start and work on than they used to. Fortunately, I still enjoy writing software, so I have a strong incentive to keep working at it.</p>]]>
        
    </content>
</entry>

</feed> 

