This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

June 27, 2009

All the Cool Kids are Going to git, But...

There has been a lot of talk lately about projects moving to git. So far I the write-ups from people converting to git have all been glowing endorsements of the new one, true way. There's almost a religious fervor related to the subject.

Since my experience has not been quite that good, I thought it was worth documenting what I have seen. Based on some of the responses I've seen to any negative comments I expect to be blasted if anyone actually reads this. But, if anyone else runs into these kinds of problems, you'll at least know one person has seen the same.

What I Want in Version Control

I have fairly simple needs from a version control system based on the last couple of decades of software development.

  1. I want to be able to store my source in a VCS.
  2. I want to be sure it can be recovered at any time.
  3. I want to be able compare different versions of a particular file or multiple files.
  4. I want to be able to mark certain revisions so that I can get back to them.
  5. I want to be able to reorganize the files without major pain.
  6. I want to be able to work on my code from different systems.

I don't have a particular agenda or approach I care about. I just want to be able to work with my code and have the VCS help me. At present, I don't have a major need for distributed version control, but it might be nice. For me, a VCS is a tool, not a religion.

As time has gone on, I have used several VCS or SCM systems. These include: RCS, CVS, Subversion, and ClearCase. (I also had to help support some people using SourceSafe long ago.) I've done branching and merging in all of those (except RCS), so I'm fairly conversant with the general issues.

git: First Impressions

A couple of years ago, I tried to use git and was badly frustrated, I could not get data committed. I wasn't able to follow my normal workflow. The tool forced me to completely change the way I was working. I immediately dropped it. I had a similar experience with another distributed VCS tool (I don't remember which one), and so I discounted the whole mess as a bad idea.

Later, a fellow Perl Monger started talking a lot at the local meetings about how git was working well for him, and even gave a talk on the subject. It seemed like git might be worth trying again. With the information from that talk and better online resources that had become available in the intervening time, I was able to use git for a few minor projects I was working on.

It turns out that my original problem had to do with the index feature. This had apparently been a problem for many people and the newer tutorials made a point of explaining this feature better. I eventually got used to the extra step of re-adding files I had changed (or using the -a switch).

I was becoming somewhat comfortable with the tool.

The First Disaster

Because I was comfortable with my Subversion repositories and I had been told about the git svn tool, I was using Subversion as my remote repository. This allowed me to have it backed up with all of my other repositories and fit my comfort zone better than having the whole repository in the local directory. (I know a lot of people swear by that feature. But I remember disasters in the old RCS days when the repository was also stored with your sandbox. It was too easy to lose both your current work and all of the history with one mistake.)

Things seemed to be going along okay until one day when I decided to push some changes from my laptop to the Subversion repository and pull them to my working directory on my desktop.

At the time, I think that my desktop was up to date with the master branch. I had been doing some history rewriting clean up a few commits on a (local) branch. I merged a branch on my laptop to master and pushed the changes to the Subversion repository. A day or two later I pulled from the Subversion repository to my desktop machine. (The details are a little hazy since I expect the version control to keep up with what I've saved and when.) When I did, there were merge conflicts like crazy. Almost every file was conflicted somewhere. I tried to resolve the conflicts by hand and could not get everything back into a stable state.

This was quite surprising, because I've merged multi-month long branches in CVS (with much pain and suffering) as well as resolved merges in Subversion without much problem. Given the hype about how easy merges are with git, I was not expecting this.

Eventually, I came to the conclusion that the best thing to do was to blow away my working directly and start over with git svn with a new working directory. This was not a good feeling. Although I'm pretty sure there was nothing in my desktop working directory that I lost, this was not the kind of behavior I expected from a VCS.

With some research, I eventually convinced myself that I must have messed up somewhere in the history rewriting and that was the cause of my mistake. Maybe history rewriting and Subversion weren't compatible or something.

Twice is Coincidence

A few months later, I was working on another project. I was still using Subversion as the remote repository for working with git. Honestly, despite all of the assurances that everything that goes into git comes back out again, I was still more comfortable with Subversion for safety.

Once again I was working on the code from two different machines. I had just finished some relatively hairy work and pushed to the remote (Subversion) repository. A day or two later, I pulled on the other machine and BAM, I'm in conflict hell again. I tried to resolve the issues without a whole lot of success. The conflicts did not seem to match up with what I could see on either machine. This time I'm sure I hadn't done any history rewriting. (I wasn't using that feature after the previous disaster.)

After fighting with the mess (and pulling two or three more times), I eventually gave up and blew away my working directory and rebuilt it.

I checked for similar stories on-line and talked with my local expert to no avail.

The biggest problem I had with it was that the actions that blew up that day were identical to things I had been doing all along. I couldn't track down exactly what I'm doing wrong or even localize it to a sequence of steps that caused the problem. As a developer myself, I know that reporting a bug that randomly blows up after doing something that worked the last dozen times in a row was not going to be taken too seriously.

At this point, it's worth reminding you that I've been using version control tools for almost 20 years. I've recovered from disasters in almost every one that I've used. I have never been left in this situation before.

Glutton for Punishment?

Despite a few bad experiences so far, most of my usage of git has done what I needed. I was still not completely comfortable with the new workflow. But the ability to add aliases and script new commands is quite addictive. I knew that my experiences had to be odd, otherwise people would be reporting them and dropping git like a hot rock.

I had a new project that I wanted to work on, so I decided to do things a little differently. This time, I worked entirely in git, no Subversion repository. I made a bare repository along side my Subversion (and CVS) repositories to give me a single spot to back up and began working on the new project.

Things went along fine for a month or so, until I needed to get ready for a conference. I had been working mostly on my desktop, but would need my laptop updated before I could go to the conference. I merged a couple of feature branches back to master and made certain everything was working fine. I pushed the master branch to the remote (git) repository. I went immediately to the laptop and did a pull. BAM, my working directory was suddenly a smoking crater with conflict shrapnel everywhere.

Unfortunately, this was a Catalyst project and I ran into a new kind of conflict hell. The Catalyst system uses a Perl ORM that creates class descriptions for data stored in a database. The main description of the classes are protected by an MD5 sum to show they haven't changed. Merging files with this sum in it are guaranteed to have conflicts. Unfortunately, I couldn't get the code to match up with either MD5. By this point, I had learned about the git reset command. But I still wasn't able to completely recover.

Still Using git?

In any normal circumstance, I would probably have throw away any tool that causes me this much grief. I have not yet had the religious conversion that many seem to have where git is concerned.

The only saving grace is that despite the disasters, I've always been able to recover my code (if not my working directory). I also haven't been able to nail down the problem. My latest attempt to stop the problems is to stop using the git that comes with Ubuntu and update to the latest.

I'm not convinced that git is as wonderful as everyone says, but it does have features I like.

Unlike most of the people writing about git, I'm not a true believer. It's got some advantages over the systems I've used before. But, despite their flaws, I've never had either Subversion or CVS to leave me with a smoldering crater where my working directory was.

I'm sure that someone (if anyone actually reads this) will tell me that I'm doing something horribly wrong, or that I just don't understand the beautiful elegance of Linus's vision. Frankly, I don't care. Elegance of design doesn't matter if the implementation blows up. As Richard Feynman once said,

It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong.

I use version control to support my code. If it doesn't, I will switch to a new system. I haven't given up on git yet (it hasn't yet lost any code I've committed), but one more explosion may the last.

Posted by GWade at 01:13 PM. | Comments (0) | TrackBack (0)

May 06, 2009

More SVG and Perl

In my last post, I talked about a quick little project that has grown into experimenting with several new tools/processes.

I finally got a reasonable release out on CPAN as SVG::Sparkline version 0.30. This version supports 6 different sparkline types: Area, Bar, Line, RangeArea, RangeBar, and Whisker. It also has better documentation in the form of a manual and a cookbook.

Proving once again that you never know who's watching, Jeff Schiller (of SVG fame) commented on my last post that I needed a demo. <grumble/> I had a little demo application that I was using to print a primitive gallery of sparklines for my own debugging purposes. I've cleaned up the output and put the Sparkline Gallery on-line.

I also found that the module did not have as high Kwalitee as I expected. So, I did some cleanup to improve that value. (If you haven't run across kwalitee before, you might think it's a misspelling. Actually, it's a part joke/part serious measure for Perl modules. Check out the article for details.

I would have had this version finished sooner, if another SVG project had not distracted me. David Dailey mentioned the idea of a Friendly Little Intermittent Clockfest. The last few times this subject came up, I was not tempted. For some reason, this time it bit me. Sometime soon, I'll probably be adding an SVG clock gallery to my site as well.

Every now and then the programming muse shows up and the ideas start flowing.<grin/>

Update: Thanks to a minor failure on my part the 0.30 release was missing the Manual. I've released 0.31 to fix that.

Posted by GWade at 08:30 PM. | Comments (0) | TrackBack (0)

April 25, 2009

SVG Sparklines in Perl

For the last few weeks, I've been working on a Perl module for creating Sparklines* in SVG. The original purpose of the project was a demonstration of working with SVG from Perl. It was intended to be an example in a talk I was going to give for our local Perl Mongers group. As usual, it has grown to be much more than that. (I still hope to get back to that talk.<shrug/>)

In the process, I've been exploring a few other ideas that were not part of the original plan. I planned to develop the module as I usually do and then release it to CPAN. I released an early version (something I normally do not do) and got feedback from an interested developer within 24 hours. This was on a barely functional module.

Based on that email exchange and conversations with some local developers, this is turning into a much more robust and flexible module than originally intended. The interface has improved dramatically. There will be more types than originally planned. There are more ways to configure the look of the generated Sparklines. There is now a cookbook explaining how to generate different effects.

I've also put the module on Github as svg-sparkline. Although I've been playing with git off and on for a few months now, this is the first time I've tried to use with it seriously.

Unlike many of my past projects, this one requires more visual design thought than I normally need. In a way, this quick, little project has turned into a way to experiment with several new ideas and skills at once. We'll have to see how this one turns out.

Note:
* A Sparkline is an intense, word-sized graphic intended to convey useful information inline within text. The concept was proposed by Edward Tuft in his book Beautiful Evidence. For more on sparklines, see Tuft's article on the subject.

Posted by GWade at 07:48 PM. | Comments (1) | TrackBack (0)

March 14, 2009

Chronistic Coupling, Communications

The comments from Ian and rlb3 have made me think a bit more on what I said last time about Chronistic Coupling. One thing I didn't make perfectly clear is that I'm not advocating avoiding Chronistic Coupling at all costs.

Any real system will require some amount of Chronistic coupling. The key design point is to decide how much. Choosing the wrong level of coupling will certainly impact how your system evolves in the future. Over the next few posts, I'm going to explore some of these levels of Chronistic coupling with some examples.

Communications Protocols

Once upon a time, people doing communication between two processes (or computers) regularly debated how the data should be transferred: ASCII or binary. (This was pre-Unicode.) The advocates of the binary approach argued that it was more efficient for two reasons:

  • Fewer bytes sent over the network
  • No time spent converting to a network format and back

When we transferred data at 1200 or 2400 bps these arguments were pretty convincing. Especially when communicating between processes on the same machine.

However, there were problems when communicating between machines that were not the same architecture. When crossing the architecture boundary, you had to do conversions anyway. Some places where the binary format might change include:

  • Byte order
  • Size of primitive data types
  • Format of floating point data storage
  • Padding in larger binary structures (structs, etc.)
  • Encoding of strings (nul-terminated, length, etc.)

Soon, a sizable amount of effort could be applied to converting binary data from other machines to the native format. The worst part about this was the lack of information in the data stream to help troubleshoot problems. Normally, you found out that your decoding logic was wrong when some portion of the binary data stream gave ridiculous results, or when you got to the end of the stream and found you had too little or too much data.

Meanwhile, text-based protocols sent more data over the wire (which became less of a problem as networks became faster). But, where a text-based protocol really shines is in debugging the data stream. If the next number in the stream is 1000000 and you expected a 16-bit short int, it's easy to see there's a problem. In a binary stream, the first two bytes of a long int look the same as an actual short int, there's no way to tell (at the protocol level) that something is wrong.

There were still problems. There was the EBCDIC vs. ASCII issue, which has mostly gone away. There is also the line ending problem, (LF vs, CRLF vs. CR).

The biggest win for the text-based protocols was the success of TCP/IP protocols on the network. A large number of the protocols that run the Internet are basically text. For example, HTTP, SMTP, FTP, Telnet, and more are basically a series of text strings sent between the client and server.

The major solutions to the size issue are relatively straight-forward. First, the networks got faster, so the problem is less of an issue. In places where bandwidth is still a problem, we can compress the text stream (gzip) to reduce the number of bytes. Since the compression is something that can be used by everyone, it is been greatly optimized over the years giving more benefit to everyone.

The Present

As a result of the (possibly compressed) text-based protocols used on the net today, machines with very different architectures can communicate easily. Text protocols have a lower chronistic coupling than binary protocols. An email client written to work on 16-bit Windows 3.1 could send messages to a client on a 32-bit Windows XP system. A web page served from a 64-bit Linux box can be viewed comfortably on Mac OS X, Windows Vista, or a mobile phone. More importantly, these clients don't need to know if the web page was generated from a C++ program, Ruby, Java, Lisp, or even Forth. It just doesn't matter.

Our video and audio formats are still binary because of the large amount of data being transferred. We still have chronistic coupling issues there. If you don't have the right codec for the file, you are basically out of luck. Many of these codecs are tied directly to the architecture where they were written.

In this case, the trade-off for reduced size is still more important than the ease of porting to multiple architectures.

Posted by GWade at 11:29 PM. | Comments (0) | TrackBack (0)