DVCS

Currently I use Subversion for my private projects. Aside from being central, I’m perfectly happy with Subversion: It supports file renames, automatically converts line endings, keeps track of directories – even empty ones, and adds meta data to each object. In comparison with other centralized version control systems, Subversion is what I – as one that rarely needs things like merging – dreamed of.

Unfortunately, I often work on my projects when offline or otherwise without connection to my central repository. So I was hooked on when I heard about distributed version control system in the German Chaosradio Express. Suddenly I knew what I was missing: Committing when on the road. I should add that I always try to commit finished steps, so I do not commit the change in one file if it requires a change in a second file. Whatever revision in my repository you look at, it is a version that compiles (unless I made a mistake). On the other hand, I try to commit after every step so two changes are always two revisions, too. Committing on the road would really help me here.

A while ago I started my low priority search for a distributed version control system that meets my requirements:

The first three are there since I use these things. The fourth one is a must since I do not want to loose my history and I want to have the option to revert. The fifth is a bit philosophical but I like to have the option to compile the source on my own instead of having to wait for a binary package (waiting even longer for a PPC version on Mac or something like that).

Something I liked about Subversion from the beginning was that the Subversion stuff was hidden in a directory called .svn instead of a more visible name lik CVS. When I heard the Chaosradio Express episode mentioned above I loved the idea of having only one place with management data instead of one directory per subdirectory.

I like that Subversion holds arbitrary meta data for every item it versions. I do not use it too often but I like to know that I could build cool things with it. The only meta data that I require to have are „native line endings“ and „this file is executable“. The first is an regular problem with makefiles, the second is a problem if you work on a FAT drive and your scripts should keep their executable bit.

I do not branch that often. But sometime I do. Merging is very important with distributed VCS so this has to work good.

There is one more requirement: The DVCS has to be reasonably wide spread. I put three systems on my short-list:

There is a long but not outdated comparison here and a short but up-to-date one on Wikipedia but here’s what I found:

Git

Git is probably the most often used DVC system on my short-list. Let’s see how it performs on my requirements:

File Renames
Git does not know much about file names. It uses file names only to know what junk of data you are talking about. There is a rename command but this does not add a rename entry in the version history. It marks the file as deleted and then adds the file with its new name to the repository. This means you more or less loose the history of a file each time you rename it. Not good, 1 negative point for git here.
Empty Directories
Git does not version directories at all. The path is part of every file and git creates directories as needed to place files in the directory. If you want a directory to be created on checkout: Place a file in it. Call it dummy (or to hide it a bit better: .gitignore). Not knowing about directories is the second negative point for git.
Symbolic Links
Git does not know much about symbolic links – which is good. It stores them as it would store ordinary files and it restores them the same way. After all it does not break them: One point on the positive side.
Subversion
Git’s at home here: You can import your subversion repository, export it and even simulate a git repository on top of a subversion repository (and the other way round). Git and subversion are good friends. Another point on the good side.
Source Code
Git is GPL-licensed, no problem here. One more thumb up.
Admin Clutter
I didn’t think about the .svn directories until I heard that one .git is enough. So: Clearly a “I like it”.
Meta Data
Different from what I am used to in Subversion but everything I will ever need. Again a point.
Merge Tracking
All DVCS do this. Git also provides a nice pseudo-graphical view on the command line. One for you, git.

What else is there to be said about git? Git is used to version the linux kernel so I would consider it usable without any further testing. On the other hand, the revision numbering scheme does not work the way I would like it. I want to include revision numbers via template expansion into the program output. A simple number is usable, a 40 character hexadecimal string is not (think of “V1.3.0a3d0d03293b8c…”). There are ways to produce shorter revision IDs in git but you do not get as simple as with Subversion. One down here for usability.

Mercurial

Another big player in the DVCS market. Mozilla uses it, Python, too. As I said for Git: Without any dough usable.

File Renames
Very good support. Mercurial stores the old names and tracks both, renames and copies. Write down your first point.
Empty Directories
While the Definitive Guide to Mercurial says: “Empty directories are rarely useful”, I do use them. Mercurial tries to tell me how to do my work: I tell you: Cross off one point.
Symbolic Links
Not supported in old versions but who cares about powdered versions? One point here.
Subversion
Tell your Mercurial client to use a Subversion server and clone from there. Import done. You can even commit changes back to the Subversion repository. Everything fine, one point.
Source Code
GPL-license, one point.
Admin Clutter
Knows nothing about directories, does not clutter them. One point.
Meta Data
I could not find any evidence that Mercurial supports meta data the way Subversion does. Not a big problem as line endings and executable bits are supported but anyways no point here.
Merge Tracking
I did not see anything fancy here but it works as you would expect. So: One point.

Mercurial seems to have problems with internationalization. There is no support for unicode characters in the filename. While I do generally try to avoid these characters in filenames, there might be times when I use them. I also think that Unicode support everywhere is a must these days. So: Minus one here.

Bazaar

Bazaar appeared quite late on my radar. It is the least used of the three systems I compared but also has some notable user groups such as Ubuntu and MySQL.

File Renames
Just as Mercurial, Bazaar knows about file names and handles renames as one would expect: The history is kept, merging works. First point.
Empty Directories
Directories are first class citizens on the bazaar, empty or not, you can track them. Good boy.
Symbolic Links
The manual states: “For symbolic links, the value of the symbolic link is tracked, not the content of the thing the symbolic link is pointing to.” Just like Subversion did.
Subversion
There is a plugin that allows Bazaar to directly access Subversion repositories.
Source Code
GPL-license, one point.
Admin Clutter
It seems to me that I’m not the only one who thinks that storing all admin data in a single directory is a good idea. It seems more like everyone does. Next point.
Meta Data
Same as Mercurial, no point.
Merge Tracking
Also the same as Mercurial. One point.

There are two negative things I’ve heard about Bazaar: While Git is known for its speed and Mercurial seems to keep up, it is said that Bazaar is somewhat slow. Not slow as in “As slow as Subversion” but notable slower that other DVCSs. This comparison even shows that Bazaar is better than Mercurial. All comparisons I found were based on Bazaar 1.x. Version 2 however seems to be much better in both speed and size. Why switch to Bazaar? says: “Once upon a time, Bazaar was much slower than Git and Mercurial and less efficient on projects with deep history. Bazaar 2.0 changes things completely.” I did not verify this however.

The second negative thing is the repository size. I was fascinated when I heard that a complete git repository often ends up being smaller than the working copy of a Subversion repository. I often read that Bazaar is not good in this point. The comparison mentioned in the previous paragraph shows that Bazaar is not as good as Git but it is as good as Mercurial is.

[Update] I did the first part of this test: Checking out the Python source took me 27min with Bazaar and 10min with Mercurial. Based on this, Bazaar 1.x would have taken more than 2,5 hours. Great improvement but still a lot of work to do here. Repository size is not great but i could live with it: 115MB for Mercurial, 189MB for Bazaar.

At first I did not get the benefit of “Bazaar can be adapted to use whatever workflow makes the most sense for your project, regardless of its size.” (See: “Why switch to Bazaar”) After all I want to switch my workflow to distributed. Then I read about the bind/unbind command. Something I do not like about DVCS is that I have to commit locally and then push the commits to the central storage even when I’m connected to the central server. Binding helps to match my workflow: Committing locally when on the road, pushing changes to the central storage later. But when I’m at home, commiting means commit+push.

So only a small penalty for speed and space but one point for workflow.

Sum up

I build my list of requirements based on the features I missed while looking at Git and Mercurial, so it is no big surprise that Bazaar wins with 7/+0.5 points against Mercurial (5/-1) and Git (4/-1). But is it the winner for me? Will I switch to Bazaar? It’s likely. I do not want to switch the VCS too often (I switched to Subversion only some years ago) but Bazaar nearly got me.