Programmer Musings: November 2014 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

November 28, 2014

Some Thoughts on Refactoring

I recently heard an interview with Martin Fowler on the Ruby Rogues podcast. He pointed out a few things about the process of refactoring that I haven't thought of in a while. Based on that interview and what I've read and learned elsewhere about refactoring in the past few years, I decided there are a few points worth making:

Don't Change External Behavior

By definition, refactoring improves the design of the code without changing externally observable behavior. If you change the behavior, it's not a refactor. It's a bug fix, or a restructure, or a redesign, or a re-architecture. Too many people use the term refactor to mean "I futzed around in the code for a while, but didn't fix any bugs or implement any features." That is not the same thing as refactoring.

When you decide some code needs to be refactored, remember that you don't want anything outside the code you are changing to be able to see any difference. This is part of the reason why unit tests are so important to refactoring. Without good unit tests, how are you going to know if anything has changed?

Most Refactoring is Small

In the interview above, Martin Fowler explicitly stated that small refactorings are good. If you look at the definitions in the Refactoring book, you'll notice how small each change really is. You might wonder how any change could make a difference. Experience among a large number of programmers has shown that small changes can incrementally improve the code. It's slower than a big bang rewrite, but it's also much safer. Since each change is small, the risk of a major failure is also small.

Another advantage of small refactoring that many don't notice is that each successful refactor improves your understanding of the code. More understanding leads to better changes. It's reasonable to suppose that a series of changes based on increasing knowledge can result in better code.

When a Refactor Breaks

In the same interview, Martin Fowler also made a point that I don't believe I've ever heard before. If you refactor some code and it breaks the tests, don't debug your refactoring. Throw it away, instead.

If your refactoring changes are small, it won't cost you much time to throw it away. More importantly, if you throw away the change and try again, the time you spend on the change is well-defined and limited. On the other hand, we all know that debugging can be an unbounded, hard-to-estimate time-sink.

If you broke code with a refactor, that probably means at least one of the following:

You made too big a change in one step
You didn't actually understand the code as well as you thought
You didn't properly account for side effects of your change

If you think about it, the first one of those is obviously best solved by trying again, but aiming for a smaller change. The other two are most likely in the realm of unbounded debugging sessions.

Always make your refactoring changes as small as possible and be prepared to back up if something goes wrong. Good version control helps with this.

Refactoring and Tools

Some people will make the argument that an IDE with refactoring support is required to actually refactor code. This ignores the fact that those tools did not exist until programmers had proven that refactoring was helpful, which pretty much required them to work without tools. Many developers applied the recipes from the Refactoring book by hand for years without any tool support.

Don't get me wrong, tools that automate the mechanics of refactoring does simplify the process. Having the boring, mechanical parts of a simple refactor handled by your editor-of-choice definitely makes you more likely to make those changes. But, it's possible to refactor without any tools. The most important part of the process is that you think about how you are improving your design, not about which tool does the changes. Just as importantly, you should be able to apply refactorings that your tool does not support. The thinking behind the change is the important part.

Conclusion

The programming community has been aware of the concept of refactoring for years now and the practice has become much more wide-spread. Like any other practice in programming, it is often necessary to go back and re-assess what we know and how we practice. Hopefully, some of these points will help you do just that.

Posted by GWade at 08:10 AM. Email comments | Comments (1)

November 14, 2014

The Most Important Result of Writing Code

It's not a big surprise that most programmers feel that the most important result of writing code is the code itself. Other people might argue that while the code is important, other things are more important:

Results of the execution of the code
Money made by selling the program
Business value realized by running the code

With a little effort, you (the programmer) might be able to convince yourself that these items are as important as the code itself. After all, if it weren't for the value that the business gets from your code, they would not pay you to write it.

Putting aside the business considerations for this discussion, I would still say that there is a more important result to the developers than the code itself.

What does the code represent?

When we write code, we actually take our thoughts and understanding of a problem and put it in a form that can be executed. This isn't quite the same as assembling a machine or knitting a sweater. The code is a relatively direct representation of something that only existed as thoughts in your head right before you began to type.

The code represents your knowledge and understanding of the subject. The code is only a representation of the most important thing you gained from writing it. The important part is the increase in knowledge or understanding that you gained.

What's the difference?

The difference is subtle but important. Once I understand a solution to a problem well enough to code it, I could probably write that solution in different languages or forms fairly easily. I also have the ability to explain the solution better than I did before it was coded, because the act of coding forces me to get the details right and (usually) think through edge cases and boundary conditions.

I realized this was the case years ago when using one language to prototype a quick-and-dirty solution to a problem before doing the real implementation in a different language for production. It turned out that I could get the prototype working in the original language much faster than I could in the production language. This allowed me to really explore the idea in a limited time frame. But, when the prototype was finished, the knowledge went into building the production version. At that point, I could discard the prototype. It had served its purpose.

Why does it matter?

One problem that I've seen repeatedly with programmers is an inability to throw away code they have written. This sometime leads to years of patches and band-aids to a piece of code that should have been rewritten, or at least aggressively refactored. Maybe the original algorithm was way too slow. Instead of replacing the algorithm, the programmer adds a level of caching to speed things up. Then, come the tweaks and micro-optimizations. Eventually, it becomes one of those pieces of legacy code that no one wants to touch.

Everyone is willing to re-write someone else's code. But, we are usually more reluctant to throw out our own.

Another place where this problem rears it's head is in prototyping. We all know that when you write a prototype (or do a Scrum Spike) you are supposed to throw it away when you are finished. In the interest of finishing the prototype quickly, you probably added minimal error checking. You may have ignored boundary conditions. It may fail hard in certain circumstances. These are all valid design decisions for a prototype.

Many people looking at the value that they think is in the prototype code, are reluctant to throw away that value. So, instead of re-writing the real version from scratch, they try to clean up and fix the prototype. The result is almost always a fragile, less complete version with some of the original flaws still embedded.

Passing on the value

If you can convince yourself that the real benefit is the knowledge, not the actual code, it's easier to let go of the code itself to re-apply your knowledge in a new form. Maybe it will allow you to refactor the code more mercilessly, knowing that the knowledge will not be lost. Or, maybe you can put the whole thing aside to re-write the code if necessary.

The last effect of this change in viewpoint is to note that you are not finished writing the code until you are certain that it passes on the knowledge to the next person as well. Maybe that means you will need to improve the code structure to match your understanding. Maybe you need to add some judicious intent comments to make certain that the design decisions and their reasons are not lost. Sometimes this may even mean writing documentation to explain the design.

Whatever is needed, you should realize that only leaving the code and not the knowledge means that you are shorting the next developer the most important asset that you have made, the understanding.

Posted by GWade at 10:12 PM. Email comments

November 12, 2014

Fundamental Design Principle: YAGNI

Every bit of code we write is the result of trade-offs. Some seem like no-brainers: what language to use, which paradigm to use. Others are more subtle: how important is speed/maintainability/memory, fail fast or never fail, fix a bug quickly or rewrite to prevent that class of bugs.

One of the big trade-offs is between simplicity and complexity. Most good programmer at least pay lip service to the idea that we should keep our solutions as simple as possible. Unfortunately, most of us fall prey to the temptation to make our code a little more complex in the name of flexibility or future-proofing. All of these reasons really boil down to a belief that we can predict how the code will change in the future and code to protect against that.

There's only one real problem with this idea. Years of empirical evidence show that, individually and as a group, we are pretty lousy at predicting the future.

How do we do this to ourselves?

As a general rule, programmers are smart people (or, at least, believe ourselves to be). We are also somewhat attracted to complexity. (Honestly, would someone who truly abhors complexity learn to program computers?) We also like to solve problems (sometimes problems no one else cares about). This leads to the tendency to guess the future, convince ourselves we are right, and code to prevent problems that only we are bright enough to see. You only have to be sort of right once (or read about someone who was) to get the idea that you can do this repeatably.

All human brains are well suited to finding patterns. Some research has suggested this is one of the most important features of our brains. In my experience, programmers seem to have even more finely honed pattern-matchers behind our eyes than most people. This sometimes causes us to jump to a general solution for a pattern that we have recognized that may not actually exist. In many cases, this general solution is much more complex than the original solution to the specific problem we started with. We justify the extra complexity based on the problems it will solve in the future.

Simplicity as a Design Principle

We know that a simple solution is easier to understand and maintain. A simple solution is also easier to evolve as necessary to cover new needs in the future. Most good developers try to make simple solution to begin with, but it gets more complex the longer we work on it. Although some of that complexity comes from a better understanding of the problem, you have to admit some of it comes from us wanting to be clever.

A lot of this cleverness manifests from a desire to predict the future and code for the issues we can foresee. The problem is that predicting the future is quite hard. Any complexity that we add to deal with potential future needs will still need to be maintained between now and the future time where it is needed. That's a cost. If we never reach the predicted future where this complexity helps, then this extra cost is a waste.

Extreme programming had a really blunt term for this YAGNI. The idea is to avoid adding functionality until the point in time that you actually need it to fulfill a customer requirement. Until that time, you do not need the feature and, therefore, should be spending your time on actual customer requirements. Any time you start designing for potential future needs or generalizing so that we can support potential changes later, you really need to remember YAGNI.

I tend to use this term ruthlessly. I have spent a large part of my career maintaining code that is too general or too feature-rich for the current need. Sometimes, I have seen people fix bugs or speed up real functionality by removing code intended for a future that never happened. Because of this, I try to remember YAGNI every time I consider making a change for the future.

Posted by GWade at 07:18 AM. Email comments