This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.
I've recently been reminded of something I used to tell entry level programmers repeatedly: Review the basics.
And really I mean all of the basics:
In many programming positions, you can get away with coasting on these things for a long while. But, as you let your grasp of the basics deteriorate, you lose flexibility. You begin to use similar solutions to different problems you encounter. If you don't reverse the trend, you can become that guy that applies the same library/data structure/programming paradigm to every problem, no matter how badly it fits.
Some would ask why you should be familiar with lots of basic tools, when you can do your job with just a few. After all, if your employer is happy with your solution, who cares if it is the best? This is ultimately quite short-sighted. My argument has two parts.
From a practical standpoint, being familiar with more data structures and algorithms means that you are a more valuable employee. Even if your employer does not know what these tools are, who is more likely to get the raise: the guy whose caching algorithm fell over as the number of things to cache increased by a factor of 10 (because it was based on a linear array search) or the gal whose algorithm handled the change because it was based on a tree or a hash? Do you think the person who reduced the memory footprint on a critical server by changing data structures is going to be the first to lose his job?
Statistically, most of us will change jobs at some point. Do you think you will impress a potential employer with your conviction that no problem should need any structure other than a linked list? How about if you know everything there ever was to know about hash tables, but you can't sort an array? How about if your solution to every problem involves a SQL query?
Many of us became programmers from a love of solving problems and learning. At that point, everything was new and exciting to learn. Eventually, programming can become just a job. You fix bugs, write some new code, collect a pay check and go home. But, that does not have to be the end of it. We are all craftsmen to some extent.
To keep up with any craft, you need to maintain your tools. In programming, those tools are:
Part of maintaining your programmer brain is challenging it and honing your skills. When programming was new to you, you probably spent time implementing really simple programs to see how they worked. This exercise added basic tools to your mental toolbox. If you don't use those tools, they get rusty and can fail when you need them.
In times past, when a craftsman worked with his hands, the practice of maintaining and upgrading his tools was critical to being able to do the craft. You can bet that a woodworker kept all of his chisels sharp, clean, and well protected; including the ones that he would only use once in a great while. When he needed that tool, out it comes to solve the problem it was meant for.
We need to do the same with our mental tools.
Over the next few posts, I'm going to review a subset of the basics. This is partly an exercise to force me to clean my mental tools and partly to serve as an springboard for you to have an idea about concepts you might want to brush up on. I don't expect to cover everything you should know, but these posts should serve as a good start.
I was listening to an older episode of Software Engineering Radio where they interviewed Martin Odersky on Scala (Episode 62). In the interview, Odersky made a comment about closures being the literals of functional programming. This statement struck me as surprising. The more I thought about it, the more interesting and subtle the concept became.
In functional programming, functions are first-class values. (OOP advocates often say first-class objects, but this concept is more fundamental than that.) If a function is a value, like an integer is a value, a closure (or anonymous function) is the equivalent of a literal (like 2). This idea has been wandering around in my head for the last few days. The more I think about it, the more interesting it appears.
To get the most that you can out of an analogy, you need to apply what you know from one side of the analogy to the other to see if it gives any insight. Let's start with what we know about literals:
We all know the problems with magic literals. Leaving them scattered around the code is a guaranteed way to make unmaintainable code. If the same magic literal is used in multiple spots, the making changes is harder than it needs to be. We normally solve this problem by replacing the literal with a defined constant. If that constant is well-named, this improves the readability of the code. (If the constant is not well-named, it can actually make things worse.)
There are a few cases where we may decide to use a literal despite this fact.
The initial index for a loop is almost always 0 (for C, C++, Perl, Java, etc.). There's really no need to name this literal, because it's function is obvious and it will not change (for a given language). Just as importantly, it's hard to come up with a name that means more to us that the value of the constant itself (please don't name it ZERO).
If you are working with date manipulation code, the literals 7 and 12 are obvious enough that we might not choose to make the constants DAYS_PER_WEEK or MONTHS_PER_YEAR. This decision is not quite as obvious, but it would be relatively easy to justify not making a constant. On the other hand, most people would probably prefer MINUTES_PER_DAY to 1440 and SECONDS_PER_DAY to 86400. This shows that if the use of the literal is obvious enough, we may not need to name it. If there's any question about the purpose of the constant, we give it a name.
Another possibility would involve implementing an algorithm from a book that contained a large number of single use magic literals. You would probably keep the literals and reference the original algorithm in a comment rather than add constants that weren't in the original algorithm. In this case, being faithful to the original algorithm trumps concerns over magic literals.
The first thing that is worth considering about anonymous functions from this analogy is the possibility that giving the function a name might be wise. If the function is used in many places, giving it a name is probably good for maintenance. (Just like making a constant out of a numeric literal simplifies maintenance.) Likewise, code containing a large number of anonymous functions is likely to be hard to maintain.
Also like the magic literal issue, there may be cases where the anonymous function is obvious enough that no name is needed. I'm going to propose some examples using Perl for the syntax. This is mostly because Perl has a fairly reasonable syntax for anonymous functions. It also has three operators for manipulating lists that make use of anonymous functions: map, grep, and sort, Let's use map. (The map operator generates a new list by applying an anonymous function to all of the elements of another list.)
@newlist = map { $_*2 } @oldlist;
Once you know that $_ is the argument to the subroutine, it is obvious that this anonymous function doubles its argument. In this context, @newlist is a list where all of the elements are the doubles of the elements of @oldlist. A similarly obvious example (with better named variables) would be:
@elapsed_secs = map { $_*60 } @elapsed_minutes;
This anonymous function converts minutes to seconds by multiplying by 60. Giving the function a name would not really clarify things any. What about
@newlist = map { $_*($_*17 + 33) + 42 } @oldlist;
This example is the equivalent of a magic literal in the code. I can't imagine anyone looking at this and deciding that it is perfectly clear. Much like code containing the magic literal 17, this function should be named. Otherwise, the code is hard to read and maintain.
Another place that anonymous functions are useful is when filtering a list using grep. Like the map cases above, we can see that simple tests make good function literals.
@positive = grep { $_ > 0 } @unfiltered;
This anonymous function obviously selects positive numbers. Just like the next one selects numbers below a cutoff.
@selected = grep { $_ < $cutoff } @unfiltered.
However, just like the map, we can have examples that would benefit from naming the function.
@filtered = grep { $_ > 3 && $_ < 13 && $_%2 == 0 } @unfiltered;
It's going to be practically impossible to understand this without some help, so turning this anonymous function into a named function is probably worthwhile.
Like numeric literals, functional literals can make code harder to understand. However, giving all closures names as a knee-jerk reaction is not a valid answer. Consider naming anonymous functions when it would make the code clearer without breaking it. I'm still not sure whether anonymous functions are as bad for maintenance as numeric literals. More thought and experimentation is needed.
So far, I don't feel like I've made it through very many of the implications of anonymous functions being the functional programming equivalent of literals. I'll probably write more later, as more implications come to me.
In my last essay, I mentioned the Shuffling essay from Jeff Atwood's blog. This time, my essay is a little less philosophical and more technical. Several of the comments on the Shuffling essay talked about the importance of knowing the random number generator you are using.
Generating random numbers sequences is a particularly interesting subject for me. Several of the comments about generating random numbers made in that essay covered both good and bad information about generating random sequences. Because of this, I decided to explain some of what I know about the subject in the hopes of giving a little more to think about.
Most people don't think about the fact that the term random number is relatively useless. Is the number 17 random? How about 7183? How about 1? Depending on how a number was generated and in the context in which it was generated, we could say either yes or no.
A better question would be to ask what someone is looking for when they ask for a random number. Sometimes when someone says they need a random number, what they really want is a number that's not related to anything in particular in their system. They probably also want it to be different each time they ask for one. If that is all they need, almost any method of picking numbers is reasonable.
More often, people are looking for a random sequence of numbers.
What do we mean by a random number sequence?
The main attribute that makes a number sequence random is that it is unpredictable. Most of the other attributes of random number sequences boil down to measurements of the difficulty of guessing the next number in the sequence. Some of these measures include periodicity, correlation between values, and various kinds of spectral analysis. Although random sequences are unpredictable, the fact that a sequence is unpredictable (by any given person) does not mean that it is random.
In many cases, a sequence does not need to actually be random. It just needs to be sufficiently random for the purpose. Different problems have different requirements for random sequences. If I need a single number, even a low-quality sequence will do. For a quick game that needs 100 random numbers, we would need a higher-quality sequence. For a Monte Carlo simulation needing 1000s of random numbers, we need a high-quality random sequence. For cryptographic applications, only the highest-quality random sequences are acceptable.
How do we measure the quality of a random sequence?
Information theory measures the information in a process in terms of entropy. A particular sequence has low entropy if not much information is encoded in it. A sequence with high entropy contains a lot of information. In the case of a random number sequence, the harder it is to guess the next number, the higher the entropy.
This gives us a way to talk about the quality of a random number sequence. A high-entropy sequence is higher quality than a low-entropy sequence.
The problem is that entropy is hard to measure. It is also difficult to harvest entropy from high-quality sources. Most good sources of entropy are difficult to access from a computer without sacrificing some of the entropy. The time between two nuclei in a radioactive isotope decaying is very unpredictable, but the rate at which we can measure the time is not.
Trying to measure this time as a source of random numbers either wastes entropy by not being precise enough or is very expensive. There is also a maximum rate at which we can extract individual numbers in the sequence. The same holds true for Johnson Noise, Brownian motion, or any other random physical process.
Except in very specialized circumstances, people normally don't need random sequences, they need uncorrelated sequences. Back in 1947, the RAND corporation was doing a lot of work using Monte Carlo simulations. When doing simulations, you need sequences of numbers with no correlation between the numbers. However, truly random numbers are not as useful in a simulation, because there is no way to repeat a particular simulation run. They solved this problem by creating a list of one million random digits that they could reference for their simulations.
The RAND corporation sold this table in published form for people to use in experiments. Once the table was generated, these numbers were no longer random. Anyone recognizing the part of the sequence could predict the next number by looking it up in the table. You could do the same thing with the digits of π, e, or other irrational numbers. (After convincing yourself that the correlation between digits is negligible for your purposes.)
Despite the fact that the RAND table was generated as a high-quality random sequence, it is no longer a random sequence because it has been generated before (and published).
The difficulty with using real randomness in a computer lead to the creation of pseudo-random number generators (PRNGs). If we can algorithmically generate an unpredictable sequence of numbers, we get most of the benefits of a random sequence without the downsides. Any algorithmic method of generating a sequence is predictable given enough information.
Algorithmic generators started relatively simple and have become more sophisticated over the years. Interestingly, all PRNGs contain a certain amount of state. This state is a very good representation of the entropy of the generator. Early PRNGs had a single integer as internal state. Current state-of-the-art generators contain much more.
The best class of random generators today harvest small amounts of entropy from many sources. These generators extract random bits from a pool of entropy to make random numbers. Unfortunately, as we extract bits, we use up the entropy. So these generators continue to harvest entropy almost continuously. Unfortunately, we still have problems with the speed at which we can acquire new entropy and the storage for the entropy we harvest. It turns out that attempting to extract entropy from really random sources loses some of that entropy in the measurement process.
Most systems don't store all of the entropy they harvest. Instead they use a stirring function to continually merge the new entropy into a pool. At any given point in time, the pool contains a bit less entropy than we have put into it. These stirring functions are carefully designed to mix the new data into the pool in such a way that we don't generate regular patterns.
This approach is much more sophisticated than the simple unconnected sound card or lava-lamp type generators of a few years ago. In addition to being a good source of random sequences, there is solid research behind the amount of entropy that can be collected and used this way.
Unfortunately, this approach still has a downside. There is a limit to how fast we can extract random numbers from the pool without depleting its entropy. Different systems may either give lower quality numbers in the sequence for a while or pause until enough entropy has been collected.
The universe contains many sources of randomness. This randomness is difficult to harvest for use in a computer program. Randomness is a more complicated concept than most people realize. As Donald Knuth pointed out in Seminumeric Algorithms (The Art of Computer Programming, Vol. 2), you should never randomly pick a method for generating random numbers.
This week, there was an interesting post on Jeff Atwood's Coding Horror blog. This essay was on Shuffling. Shuffling is an application of random numbers, which is a particular interest of mine. I will write more on that subject in an upcoming essay.
One of the comments caught my eye because it echoed a sentiment I have fought several times in my career. The commenter referred to a particular simple solution to the problem and said
Random removal from a mutable array is the same approach I used last time, and the same approach I'll use next time, unless someone shows me why it's wrong.
This answer really bothered me. It wasn't particularly about the solution the person chose. I also would prefer not to abuse a particular person. After all, depending on the domain, it might be a reasonable answer. My problem is with the assumption that it is someone else's job to prove that the simple (naive) solution is not the best one.
I often run into this problem with code written by domain experts. The idea that the naive solution they created in 30 seconds of thought is probably better than the one created by a computer science expert after a great deal of research is baffling to me. The fact that the documented solution is easy to implement and supplied by many libraries makes this comment even more amazing. Just because the computer scientist doesn't know about your domain is no reason to discount their knowledge of the computer science domain.
As a professional programmer, I feel that improving my knowledge of algorithms is part of the craft of writing code. When looking at a new problem, I often try to check my books and online references for better solutions, especially if the problem seems particularly general. After all, many of the general problems in computer science have been studied for 30 or 40 years. Many of the people doing that research are extremely bright. They were also probably pretty knowledgeable about the area of their own research. It is relatively safe to assume that they knew more about the subject than I do.
Granted, our field often involves trade-offs. Sometimes the well-studied general algorithm is not the best solution to the particular problem you are working on. But, without understanding the general algorithm you can't know that. I have often heard comments along the line of our problem is much more complicated than this toy research problem, so we'll use our solution. (As an interesting footnote, almost every place I've ever worked in about two decades of software development was convinced that they are solving harder problems than everyone else.)
If the person making the comment has carefully considered the algorithm in question and can point to specific issues with the general algorithm, that is one thing. We all make design trade-offs all the time. However, if you don't bother to look at the general algorithm and assume that your approach is good enough, you are missing an opportunity to learn and you are potentially generating worse software in the bargain.
I have repeatedly worked on code where someone is implemented a horrible naive algorithm without bothering to look at the implications. For example, a recent performance problem was caused by an O(n2) algorithm that was easily converted into an O(n) algorithm with a small amount of thought. Unfortunately, the original programmer could not be bothered to look for a better approach. After all, in the small test sets he had checked the naive solution was fine.
I don't think it is unreasonable to expect a professional programmer to improve his or her craft by studying classical solutions to known problems. I also feel that, as professionals, we should welcome the opportunity to improve our understanding and our tools when a problem we are studying turns out to have a standard, well-researched solution. Understanding the solution improves our ability to analyze algorithms in the future. Seeing how the experts build and analyze algorithms will help you improve your ability to do the same.
Deciding that it is someone else's job to prove that the standard solution is better than your quick thought is handing the job of improving your skills to someone else. As a professional programmer, I realize that knowledge is what allows me to do my job. Seeking knowledge and understanding is, therefore, the most important thing I can do to improve my skills as a programmer.
Although I have been called arrogant many times, even I am not so arrogant as to believe that I am smarter and know more than all of the people working in software over the last 40 years.
Several times in the last few years, I have written about the subject of memory management, garbage collection, and object lifetime. Some of essays I've written on this subject include:
Recently, I was thinking about this issue again and had a slightly different insight into my favorite argument about garbage collection. I have often said that one of the things I don't like about the standard mark-and-sweep approach to GC is that it usually disregards object lifetime. I've normally seen the loss of defined object lifetime is a problem with GC. I've argued before that this loss removes a very important tool from the programmer's toolbox.
However, I recently began to consider another aspect of object lifetime that I had previously overlooked. Even in a language that supports GC, being aware of object lifetime is still worthwhile.
In languages, like C++, that support defined construction and destruction of objects, the lifetime of an object is directly connected to the point when the constructor and destructor is called by the runtime system. A good C++ programmer should think how an object should be created and destroyed. The programmer uses object lifetime to determine what needs to happen at the beginning and end of an object's life.
Defining the length of time the object lives also requires a little thought. If the object only needs to live a short period of time in one function, a stack object is appropriate. In other cases, a smart pointer can be used to constrain the lifetime of an object in well-defined ways.
A good C++ programmer thinks carefully about object lifetime and as a consequence has little problem with memory leaks. If the C++ programmer does not think about object lifetime issues, memory leaks abound.
Memory leaks are not supposed to be possible in GCed languages. However, I have seen several cases of Java programs that leaked memory. In one sense, these are not really leaks, since the memory is referenced somewhere. But, in another sense, they are leaks because the memory usage of the program is growing unexpectedly. In many of these cases, the problem is that the programmer has forgotten about the object and did not tell the language to forget the object.
Even in a GCed language like Java, memory leaks are caused by not thinking about object lifetime. If the programmer attaches an object to a longer-lived object or container and forgets to remove it when the object is no longer needed, the object has effectively leaked. By not properly discarding references when the program is finished with them, the programmer has generated memory leaks just as surely as if the language did not support GC. It's not an issue for short-lived objects created and discarded within a short stretch of code. But, then leaks aren't possible with stack-based objects either.
Whether your language of choice supports GC or not, thinking about object lifetime will help you avoid memory problems. It may also help you avoid leaking other resources by making certain you are clear about when the resources are needed.
Compiler Design in C
Allen I. Holub
Prentice Hall, 1990
I decided to take a break from the relatively new books I've been reviewing and hit a real classic.
Over a decade ago, I saw Compiler Design in C when I was interested in little languages. A quick look through the book convinced me that it might be worth the price. I am glad I took the chance. This book describes the whole process of compiling from a programmer's point of view. It is light on theory and heavy on demonstration. The book gave an address where you could order the source code. (This was pre-Web.) All of the source was in the book and could be typed in if you had more time than money.
Holub does a wonderful job of explaining and demonstrating how a compiler works. He also implements alternate versions of the classic tools lex and yacc with different tradeoffs and characteristics. This contrast allows you to really begin to understand how these tools work and how much help they supply.
The coolest part for me was the Visible Parser mode. Compilers built with this mode displayed a multi-pane user interface that allowed you to watch a parse as it happened. This mode serves as an interactive debugger for understanding what your parser is doing. This quickly made me move from vaguely knowing how a parser works to really understanding the process.
Many years later, I took a basic compilers course in computer science and the theory connected quite well with what I learned from this book. Although the Dragon Book covers the theory quite well, I wouldn't consider it as fun to read. More importantly, nothing in the class I took was nearly as effective as the Visible Parser in helping me to understand the rules and conflicts that could arise.
Although this book is quite old, I would recommend it very highly for anyone who wants to understand how parsers work, in general. Even if you've read the Dragon Book cover to cover and can build FAs in your sleep, this book will probably still surprise you with some fundamentally useful information.
The book appears to be out of print, but there are still copies lurking around. If you stumble across one, grab it.
George A. Miller's paper The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information discussed some limits of the human brain with respect to information processing. In particular, his research had found that people are unable to keep up with more than 5-9 different chunks of information at a time. This is actually why phone numbers in the United States are seven digits long, or more accurately, why they used to be an exchange and four digits. (The exchange was eventually replaced by three digits.)
I know a lot of you are thinking that information cannot be true. After all, you know you can keep more things in mind at one time. Right? According to Miller, the key is the word chunks. These chunks can be different sizes. A concept that carries a lot of information is still a single chunk. This is why is is harder to remember 10 randomly chosen numbers or words than it is to remember the words to a song.
A large portion of the history of programming has been devoted to making our chunks more meaningful. Higher level languages allow us to work without keeping trying to remember what each register is holding and how many bytes to we need for that jump instruction. Each succeeding level allows us to keep more in our heads by making the chunks bigger.
But that only works as long as the concepts map well to single chunks. If you don't have a name for a chunk of information or a concept, it takes up more of your memory. One of the more useful effects of Design Patterns was not new OO techniques. It was the new names. Suddenly, you could refer to the Singleton Pattern instead of this class that there's only one of in the whole system but is available to everyone, sort of like global data but not quite.
This same concept applies to user interface design. Grouping related items on the screen and putting the most commonly used items where they are immediately accessible are two ways to reduce the amount of your mind tied up by keeping up with what goes where.
The concept of chunks and Miller's magic number applies in many places in programming. Here's a few to ponder: