Programmer Musings: January 2005 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

January 29, 2005

Conversion to Subversion: The Project's Trunk

In Conversion to Subversion, Part I, I described the problems I found when I began converting my CVS repository to Subversion. In this article, I describe the work and surprises that came from the first project migration.

My first idea was to build the repository from the dump file and then fix the result using moves inside the repository. Unfortunately, that would have left the previous history in the wrong places in the hierarchy. Although this would not have prevented me from doing further development, looking at previous versions would be messier than I'd like.

So, obviously, I needed a was to make the repository right when projects were added for the first time. The section in Practical Subversion on importing from other systems suggested that the format was relatively easy to modify. Reviewing the relevant sections of Version Control with Subversion confirmed this information. If I could make the required changes to the dump file, then I could create a repository laid out the way I wanted it.

The First Project Migration

The first step in this process was to dump a particular project with its related tags and branches. Examining the projects I wanted to move gave me one that had neither tags nor branches. This would be about as simple a case as I could start with. To hide irrelevant details, let's call this project smallproject.

As I said in the previous article, the path for this project would have the form: trunk/Repository/smallproject. To extract this project from the main dump file named cvs2svn-dump, I used the following command:

svndumpfilter include trunk/Repository/smallproject \ --drop-empty-revs --renumber-revs \ < cvs2svn-dump > smallproject.dump

The --drop-empty-revs option removed revisions that did not have any relation to the project I want. The --renumber-revs option cleans up the numbering in the file. I found it more convenient to have contiguous revision numbers when examining the file.

Since I needed to do a relatively simple fixup to the new dump to change the path, I used a Perl one-liner to make the change:

perl -pe's!trunk/Repository/smallproject!smallproject/trunk!g;' \ smallproject.dump > smallproject2.dump

This just uses Perl's substitute operator (with '!' as a delimiter) to change the old path into the new path everywhere in the file. I put the output in a different file so I could compare them and make certain that there were no unexpected differences. After I verified that the paths in the file looked correct, I was ready to go.

One of the reasons I had picked this project was that I had decided that I wanted it in a different repository than the source code I had from my earlier experimentation. So I created a new repository using the command:

svnadmin create /home/svn/newrepos

where newrepos was actually the real name of this repository. But, we'll stick with this pseudonym for now. Then, I loaded the project with the following command:

svnadmin load /home/svn/newrepos < smallproject2.dump

This promptly failed with a message that smallproject/trunk was not found. Of course it wasn't found, I'm trying to create it.

After a bit more experimentation, I realized that the load was failing because the path /smallproject did not exist in the repository yet, so load could not create a subdirectory. So I recreated the repository and prepared to begin again.

With a clean repository, I created the beginning of the project with the following command:

svn add file:///home/svn/newrepos/smallproject \ file:///home/svn/newrepos/smallproject/tags \ file:///home/svn/newrepos/smallproject/branches \ -m "Migrate smallproject project."

I have left off the creation of the trunk subdirectory, otherwise the load would still fail when it attempted to create that directory. Then, I reran the load successfully. I used the svn tools to check out this project in the new repository and verify that everything appears to be as I expected.

The first actual migration worked. To simplify my work for later steps, I converted several of the command lines listed above into shell scripts to make running them a little less error prone. One other piece of insurance I started was to do a dump of any repository right before adding a new project to it. This gave me an easy way to recreate the previous state if/when something went wrong.

Next time, I'll explain how I dealt with a project with tags.

Update:

Thanks to Lars Mentrup for catching my cvsadmin/svnadmin goof. The text has been corrected.

Posted by GWade at 10:13 PM. Email comments

January 25, 2005

Conversion to Subversion, Part I

For about a year now, I've been playing with Subversion on small projects. In order to protect my main repository in CVS from my experiments, I just created new projects under Subversion and worked with them there. All of my real projects continued under CVS control. This way if my experiments with Subversion were a disaster, I would only lose revisions from the new work.

Now, I've finally reached the point where I want to move some of my old projects over to Subversion. I could just add all of the projects in their current state, but I do not want to lose the history. Since this turned out not to be quite as easy as I expected, I figured it might be useful to document the process I am going through in case anyone wants to learn from my mistakes.<grin/>

My CVS Repository

To understand the examples, you will need a little background on the CVS repository that I am working from. This repository holds about thirty projects that I have worked on over the last few years. Some of the projects are big, some are small. Some are currently undergoing work, some are effectively dead. Some of these projects date back over ten years, some are relatively new.

The repository lives on a Linux box in the directory /home/cvs. The directory where the actual repository is stored is called Repository. I started keeping my repository under /home when I started keeping my /home on a separate filesystem. This makes backups and upgrades easier. Moreover, some of the items in the repository could be considered private, so putting the repository with the home directories reminds me to treat it with the same care as I treat my home directory.

The Goal

My goal is to move my current projects to Subversion repositories. The move must also meet the following additional goals:

All history must be retained.
All tags must be retained.
Branches may be retained.
Directory structure matches recommended practice for Subversion.

Although, I consider tags to be important, I have no work currently going on in any branches and all code from any branches has been merged into the trunk. I would prefer not to lose those branches, but it's not a requirement like the others. Additionally, I am experimenting with multiple Subversion repositories. So I may want to separate some projects into different repositories.

cvs2svn

My first idea was to just use the cvs2svn script that comes with Subversion to convert directly. While examining the program, I found that it has an option to just make a dump file without changing the Subversion repository. This would allow me to do some poking around before actually moving the data to the new repository.

From reading Practical Subversion recently, I was aware that the installation should include a program called svndumpfilter that allows extracting parts of a dump file. This could allow me to move individual projects instead of moving everything at once.

I needed to look at the dump file to determine the paths needed for svndumpfilter to extract my projects. This was when I found my first surprise. The structure of the revision tree in the dump file did not match the structure of repository I wanted to create. As an example, assume that I have a module in the CVS repository named project1. That project has a tag named RELEASE1. Finally, the project has a branch named major_rewrite. The directory structure from the dump file for this configuration would be:

   /trunk/Repository/project1
   /tags/RELEASE1/Repository/project1
   /branches/major_rewrite/Repository/project1

Unfortunately, this does not match the recommendations from any of the articles or books I have read on Subversion. Based on those recommendations, the structure of the Subversion repository should be more like:

   /project1
       /trunk
       /tags/RELEASE1
       /branches/major_rewrite

with the history stored in the /project1/trunk directory. In the time I've been working with Subversion, I have become accustomed to this structure and wanted to continue to use it.

The second surprise came when I examined the tags and branches. Both branches and tags are made in strange way in the dump file. The entire repository is copied for each tag (or branch), then any modules that are not supposed be part of that tag (or branch) are deleted separately. This means that there will be a series of revisions in the repository with tags/branches applied to projects that were never part of those tags/branches. None of this is visible in the final version of the repository, but it seems a bit inelegant.

In summary, this approach would result in all of the history from the CVS repository being copied to a new Subversion repository, but there are a few problems.

The new repository structure is not ideal.
Extraneous revisions with inaccurate information in tags and branches.
All of the projects in one repository.

None of these is a killer problem. I would just like to set up the new repositories in a cleaner way. Come back next time to see how I fix it.

Posted by GWade at 08:29 PM. Email comments

January 21, 2005

Review of Practical Subversion

Practical Subversion
Garrett Rooney
Apress, 2005

I have worked with several version control systems over the years. But my system of choice for the last decade has been CVS. For the last year, I've been looking at Subversion and I like a lot of what I've seen. I've read the book Version Control with Subversion, which does a good job of covering the program, but the information wasn't quite complete. This book answers most of my outstanding questions.

The writing style is quite readable and the examples and explanations are well done and helpful.

However, the book does suffer from a kind of split personality. On the one hand, it wants to be a good handbook; something that you can use to get up and running with Subversion quickly. On the other hand, it wants to be a reference book; where you can go for all the details of setting up and using Subversion under any circumstances. It is possible to do both in one book. If a book covers the basics up front and saves the reference for later, the reader can tell immediately which section he needs..

Unfortunately, Rooney did not take that approach.

The first two chapters are great handbook material. They help familiarize a novice user of Subversion with the tool. These chapters introduce concepts and commands in a very practical fashion. The trouble begins in chapter 3, where the author suddenly changes from the handbook style into a definitive reference of the administration details of Subversion. This reference-style continues for three chapters, before suddenly changing back to practical information with the Best Practices chapter. Then, in the middle of the next chapter, we go back to reference style.

While both the handbook and the reference are necessary, I think it would have been easier on the reader if the two styles had been separated into two separate sections of the book. Changing back and forth makes the book harder to read than necessary.

That being said, I still think this is a very good book that complements the earlier work on Subversion quite well. The reference material on the different server types, programming to the API, and the conversion programs were definitely lacking from the earlier book. Practical Subversion does a great job of filling that lack.

The real highlight of the book for me was the chapter on best practices. Over the years, I had to discover most of these the hard way. It is great to see them written in such a clear and usable fashion. Time and time again, I have seen programmers misuse version control systems because they lacked the wisdom displayed in this one chapter. This chapter should be required reading for every programmer.

Despite the criticisms above, I still found this to be a very good book. I recommend it to anyone who is using Subversion or who plans to use Subversion in the future.

Posted by GWade at 11:57 PM. Email comments

January 02, 2005

Kinds of Problems

Of all of the lessons I have learned doing software development, one of the most important was to recognize what kind of problem I'm trying to solve. This sounds pretty trivial, but I'm not talking about the categorization you are probably thinking of. As software professionals, we tend to look at all problems as solvable. We can partition the problems we observe into multiple logical categories. This problem seems to be mostly a database problem. That problem is going to be mostly user interface. This other problem is going to be mostly about performance. Some problems cross multiple categories.

The hardest problems we have to solve as software professionals don't fall into these categories. Over time, I have begun partitioning problems into one of two categories before doing anything else. They are

technical problems
business problems

These categories are not hard and fast. Most of the problems we work on cover both kinds of issues. But, it is important to realize that some problems fall more into one category than into the other. Unfortunately, by our natures we are drawn to technical problems. Sometimes that pull is so strong that we forget about the other kind of problem. We assume that every problem can be solved given the right technical approach. Unfortunately, that isn't the case.

Several times in my career, I have supplied a good technical solution to a problem only to be shot down by someone in a non-technical role. In many cases, I was forced to implement a significantly inferior solution and then watch it fail in most (if not all) of the ways I had predicted. In most cases, I had to apply nasty hacks and band-aids to keep this bad solution running despite all of my warnings about how much this solution would cost in the long run. I have also seen many other programmers suffer a similar fate.

Eventually a couple of really good managers managed to get me to understand that sometimes the problem is not technical, it's a business problem. No amount of technological know-how can solve a business problem. Sometimes, the technically worse solution is more correct for the purposes of the business. It took years of fighting these kinds of issues before I finally came to understand this idea. Sometimes, the best you can do is to let the powers that be know what the ramifications of a technical decision are and then accept that they may overrule you for reasons that may never make any sense (to you).

Some of the best managers that have put me in this position went the extra step to let me know that they understood the implications of what I was saying. Sometimes this made things easier because I might be allowed to fix things later. Sometimes the issues were purely political and no amount of work or insight could solve the problem.

Eventually, I came to understand that no amount of tech can really solve a business problem. Throwing myself into the work to prove my solution is best or working long hours to bypass the issues are counter-productive. Now, when I recognize one of these kinds of business problems, I supply all of the information that I can to the appropriate individual. Then I do what I'm told and try to do the best I can within the constraints placed on me. Most importantly, I try not to be bothered by the illogical solutions I'm required to implement. For all I know, the whole project will be scrapped for political reasons and I'll never have to maintain the disaster I've been ordered to create.

It's not much consolation, but it's better than making myself sick over it.

Posted by GWade at 10:45 PM. Email comments