This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

January 25, 2005

Conversion to Subversion, Part I

For about a year now, I've been playing with Subversion on small projects. In order to protect my main repository in CVS from my experiments, I just created new projects under Subversion and worked with them there. All of my real projects continued under CVS control. This way if my experiments with Subversion were a disaster, I would only lose revisions from the new work.

Now, I've finally reached the point where I want to move some of my old projects over to Subversion. I could just add all of the projects in their current state, but I do not want to lose the history. Since this turned out not to be quite as easy as I expected, I figured it might be useful to document the process I am going through in case anyone wants to learn from my mistakes.<grin/>

My CVS Repository

To understand the examples, you will need a little background on the CVS repository that I am working from. This repository holds about thirty projects that I have worked on over the last few years. Some of the projects are big, some are small. Some are currently undergoing work, some are effectively dead. Some of these projects date back over ten years, some are relatively new.

The repository lives on a Linux box in the directory /home/cvs. The directory where the actual repository is stored is called Repository. I started keeping my repository under /home when I started keeping my /home on a separate filesystem. This makes backups and upgrades easier. Moreover, some of the items in the repository could be considered private, so putting the repository with the home directories reminds me to treat it with the same care as I treat my home directory.

The Goal

My goal is to move my current projects to Subversion repositories. The move must also meet the following additional goals:

  • All history must be retained.
  • All tags must be retained.
  • Branches may be retained.
  • Directory structure matches recommended practice for Subversion.

Although, I consider tags to be important, I have no work currently going on in any branches and all code from any branches has been merged into the trunk. I would prefer not to lose those branches, but it's not a requirement like the others. Additionally, I am experimenting with multiple Subversion repositories. So I may want to separate some projects into different repositories.

cvs2svn

My first idea was to just use the cvs2svn script that comes with Subversion to convert directly. While examining the program, I found that it has an option to just make a dump file without changing the Subversion repository. This would allow me to do some poking around before actually moving the data to the new repository.

From reading Practical Subversion recently, I was aware that the installation should include a program called svndumpfilter that allows extracting parts of a dump file. This could allow me to move individual projects instead of moving everything at once.

I needed to look at the dump file to determine the paths needed for svndumpfilter to extract my projects. This was when I found my first surprise. The structure of the revision tree in the dump file did not match the structure of repository I wanted to create. As an example, assume that I have a module in the CVS repository named project1. That project has a tag named RELEASE1. Finally, the project has a branch named major_rewrite. The directory structure from the dump file for this configuration would be:

   /trunk/Repository/project1
   /tags/RELEASE1/Repository/project1
   /branches/major_rewrite/Repository/project1

Unfortunately, this does not match the recommendations from any of the articles or books I have read on Subversion. Based on those recommendations, the structure of the Subversion repository should be more like:

   /project1
       /trunk
       /tags/RELEASE1
       /branches/major_rewrite

with the history stored in the /project1/trunk directory. In the time I've been working with Subversion, I have become accustomed to this structure and wanted to continue to use it.

The second surprise came when I examined the tags and branches. Both branches and tags are made in strange way in the dump file. The entire repository is copied for each tag (or branch), then any modules that are not supposed be part of that tag (or branch) are deleted separately. This means that there will be a series of revisions in the repository with tags/branches applied to projects that were never part of those tags/branches. None of this is visible in the final version of the repository, but it seems a bit inelegant.

In summary, this approach would result in all of the history from the CVS repository being copied to a new Subversion repository, but there are a few problems.

  • The new repository structure is not ideal.
  • Extraneous revisions with inaccurate information in tags and branches.
  • All of the projects in one repository.

None of these is a killer problem. I would just like to set up the new repositories in a cleaner way. Come back next time to see how I fix it.

Posted by GWade at January 25, 2005 08:29 PM. Email comments