Git: Better than a lap-dog to a slip of a girl

Basilica cistern in Istanbul

Basilica cistern in Istanbul

Git is the source code management system created by Linus Torvalds a couple of years ago to manage the Linux kernel source code (yes, git is a silly name; where was Ari Lemmke this time?). Git is now being used, or at least considered, by several open source projects other than the Linux kernel. So I decided to try it myself on some personal projects. Here is a short experience report.

My perception of git a couple of months ago was that, while git provides the heavyweight decentralized development facilities Linus needs to manage the kernel source, it did not appear easy to get started with it for everyday development. The early git documentation contributed to this impression, by focusing on the conceptual model used by git, rather than the things you need to know as you take your first steps. Furthermore, git reuses terminology from other popular SCM systems with somewhat different meanings (the nature of a git repository is quite different to that of a CVS/SVN repository, git checkout does something quite different to cvs checkout, etc.).

But after a month of using git, I've found that it works great for small projects! And it's actually quite easy to get started with it these days.

The documentation problem has mostly been addressed. There is a good tutorial. Once you know your way around the basic commands, you can explore their full functionality by reading the comprehensive man pages. And there is an active wiki containing lots of additional information and links to further documentation. There is room for improvement, of course, but all in all, the state of the documentation seems superior to that of the other SCMs that have sprung up in recent years.

The other thing that has made it much easier to get started is git's availability as an optional package on many Linux distributions. I just did yum install git on Fedora 7, but the equivalent should works on Ubuntu, Debian, etc.

Setting up a git repository for a new project is extremely easy. You just go into the directory where you have you project, and do git init to create the empty repository, and then commit the files (git add . ; git commit). That's it. Turning a bunch of files into a git repository could hardly be easier.

The main hump in the git learning curse is understanding that a commit is a two-step process with git. First, you select the changes to be committed (git add). Conceptually, this copies the changed files into the staging area. Then you commit what's in the staging area into the current branch (git commit). This separation between the two steps makes it possible not only to commit a subset of the changed files, but to commit a subset of the changes made to a single file (git add has a mode where you select the desired change hunks from the diff). So the staging area is not simply a list of files from your working directory, but actually contains the file contents to be committed, which may be different from those in your working directory. This may all sound complicated and fiddly, but it provides a lot of flexibility (similar to that achievable by hand-editing diffs if you use diff and patch to manage changes). The simple case — committing a set of files as they stand in your working directory — is handled by the git commit -a command, which combines the add/commit steps.

There is also a graphical tool, git-gui, included with git, which provides an alternative to the command line tools for managing commits. git-gui makes it easy to see what's in your staging area, how it compares with HEAD and your working directory, move changes back and forth (much more easily than with the git add text UI), and perform the commits. Although it's not fancy, I find git-gui extremely convenient, and now I'm using it to manage almost all my commits. Even in simple cases, it's nice to be able to review what you are about to commit as you write the commit message. Git comes with another graphical tool, gitk, for viewing and searching the repository history.

One of the headline features of git is advanced automated merging. So far, I have only made trivial use of branching and merging: Creating branches to hold more adventurous lines of development, and merging them back if they work out well (nothing that CVS can't handle). All of this works in an obvious fashion, and is covered in the tutorial.

Another of git's advertised features is performance. Since operations on git repositories are local (except when you are pushing and pulling changes between repositories), it's naturally much faster than a remote centralized SCM. And my projects are very modest in size. But even with that taken into account, I was pleasantly surprised that everything happens with no perceptible delay at all (even with diff and patch on hard-linked source trees, I'm used to a slight delay). So with git it is painless to commit every few minutes; the main source of effort is writing the commit messages. If you are going to work on something slightly experimental for an hour or two, just create a branch for it, and commit into that as you go. The git repositories you use for development are always private (though perhaps linked to a published repository), so you don't have to worry about choosing a particularly descriptive or unique branch name.

Another consequence of git's performance: When you are using an unfamiliar feature that modifies the repository, and you are a little unsure if you fully understand the effect of the commands involved, it's very easy to simply clone the repository and then do a dry run on the clone.

So my experience of git has been very positive. If it gains a critical mass of open-source projects using it, and developers familiar with it, it could go a long way. There are two main issues that I'm aware of that could hold up wider acceptance:

(Migrating from another SCM to git, or getting git to coexist with another SCM, seems well covered.)

Comment from Anonymous

If you did not do it yet, you may find it interesting to look at Mercurial.

Like git, it was started after BitKeeper debacle. Though I do not know either tool in depth, Mercurial apparently has a design somewhat similar to that of git. Which is not surprising since these two---along with Monotone, darcs, Bazaar, etc.---came to serve the same basic model of decentralized development with changesets freely floating between repositories. (To the best of my knowledge Arch has been the first widely used tool of that sort.)

It is too early to summarize my experience with Mercurial but so far it was mostly positive. First, its mental model fits my usual workflow naturally: I think of Mercurial as a tool to automate making and applying patches, transferring them between repositories, and keeping track of merges and what has been applied, etc. Second, among source control tools I know, this one has the most sensible defaults and shortcuts; this smoothens the learning curve---everything just works from the very beginning. It is also quite fast, at least compared with Monotone that I tried earlier.

To make this post a bit less of an advertisement: so far I did not find a good way to show branch diff "modulo merges with HEAD".

Eugene

Comment from David

I've looked at the Mercurial docs and wiki, but I haven't played with it yet. I expect that for the kinds of modest projects I was writing about, there are no fundamental differences between git and hg.

One of the reasons I was interested to play with git and then write about it was that it has come a long way in terms of usability and general quality in the last year. So my perception of git, and probably the general perception, was quite far from the reality. I was trying to do a little bit to address this.

(git also has an association with the Linux kernel and Linus Torvalds. This might be unfortunate. Linus and the kernel developer community are well-known for being quite abrasive. But now, the git community is independent, and the mailing list has a reasonably friendly tone.)

In contrast, I expect the reality of hg is much closer to my impression of it from browsing the docs. And that makes it less interesting!

It will be interesting to see how thing fall out in the decentralized SCM world in the next couple of years. The is probably only room for one winner (there is a strong network effect for SCMs in the open source world). git has linux and xorg, hg has opensolaris and mozilla. The winner might depend more on perceptions and PR than on technical issues.