Submodules and Subrepos Done Right

January 24, 2010


An approach to managing Git or Mercurial sub-repos easily, safely, and simply, while allowing you to embed Git projects in a Mercurial repo and vice-versa.

Background

Most software projects rely on other software projects to function. For example, Nitrogen depends on SimpleBridge, Coverize, and Mochiweb or Yaws. Riak depends on Webmachine and Mochiweb.

In the name of simplicity and ease of use, it’s generally a good idea for the parent repo to contain the source code of any sub-projects it uses.

But then you are faced with a decision:

Should the parent project include the full history of the sub-project as well?

You currently have three options, all with tradeoffs:

  1. Remove revision history in your sub-projects by deleting the .git/.hg directory, and experience pain when you want to pull the latest updates or commit a patch on the sub-project.

  2. Track the entire .git/.hg directory for each sub-project, and accept that the .git/.hg directory of your parent project will now be huge.

  3. Use Git submodules (or Mercurial subrepos), and hope that you never have to include a Git project inside of a Mercurial project, or vice versa. Also, accept the unfortunate fact that your build process now requires a working Internet connection.

The Search for a Better Way

What do I really want out of sub-repo support?

Furthermore:

Introducing subgit and subhg

After much thought and frustration, I think I’ve finally found a solution that meets all of my needs. It lets me work with the parent repo as I normally would, using the git or hg command. Furthermore, it gives me a different command to work with the sub-repos. Finally, it is cross platform, allowing me to mix and match Git and Mercurial projects. The only downside is that a contributor needs to jump through an extra hoop or two in order to get the history of a sub-repo.

In practice, the solution looks like this:

Change to the directory of a sub-project…

> cd ParentProject/SubProject1

Operate on the parent project…

> git status
...status info...

> git commit
...commit code...

Operate on the sub-project. Notice the use of the ‘subgit’ command…

> subgit status
...sub-project status info...

> subgit commit
...commit sub-project code...

Best of all, the subgit and subhg commands are just thin wrappers around git and hg. Each is about 25 lines of shell script.

Installation

To try this out on your computer:

  1. Save the following scripts to a location in your PATH: subgit and subhg. (Remember to run chmod 755 to make them executable.)

  2. Create a global excludes file for both Git and Mercurial, and add .subgit and .subhg to it.


    *~/.gitconfig*
        [core]
        excludesfile = "~/.gitignore"

    *~/.gitignore*
        .subgit
        .subhg

    *~/.hgrc*
        [ui]
        ignore=~/.hgignore

    *~/.hgignore*
        .subgit
        .subhg

That’s all!

How Does It Work?

The core concept behind this approach is to store the version history of your sub-project in a non-standard directory, and then use special wrapper scripts when you want Git or Mercurial to operate against that directory.

In other words, your projects won’t have a .git or .hg directory. Instead, they will have a .subgit or .subhg directory, which is not tracked by the parent repo.

|--ParentProject    <-- a Git repository
   |
   |--.git
   |
   |--SubProject1   <-- a Git sub-repo
   |  |--.subgit
   |
   |--SubProject2
   |  |--.subhg     <-- a Mercurial sub-repo
   |
   |--src

This tricks Git or Mercurial into tracking the files inside of your sub-repo, even though the files actually belong to a different repository. (Normally Git won’t track a Git repo nested inside of another Git repo.)

The wrapper scripts–subgit and subhg–do the heavy lifting to make Git or Mercurial use the .subgit and .subhg directories.

subgit simply searches upward for the closest parent directory that contains a .subgit directory. Once found, it calls git, telling git to use the .subgit directory for repository information. subhg works the same way.

Usage

To create a sub-repo that can be managed with subgit or subhg:

  1. Inside of an existing, clone a repository like normal:

    git clone git://hostname.com/repository.git sub_project

  2. Then, change to the new repository’s directory and run subgit setup. This simply renames .git to .subgit:

    cd sub_project subgit setup

  3. Now, test that it worked by viewing the parent’s Git log and the sub-repo’s git log:

    git log …print out log for the parent project…

    subgit log …print out log for the sub-project…

subhg works the same way.

Some Final Thoughts

First, this approach is intended for all of the projects out there using GitHub, BitBucket, Google Code, etc. as their main distribution channel. Most of these projects have a small group of contributors, and a much larger group of users.

If you distribute your project via a tar’d, gzip’d file, then this blog post is not for you.

Second, in order for other contributors to submit patches to the sub-project code, they will first need to obtain the full history of the sub-project. (Which makes sense, because the whole point of this was to NOT transfer the full history during a clone.)

As far as I know, the best approach to get the history is to just pull it from the sub-project’s remote URL into a tmp directory:

git clone git://hostname.com/repository.git tmp
mv tmp/.git sub_project/.subgit
rm -rf tmp

-or-

hg clone http://hostname.com/repo/path/ tmp
mv tmp/.hg sub_project/.subhg
rm-rf tmp

Then, switch to the sub_project directory and checkout the right version. (This assumes your sub-project is in a directory named sub_project.)

Downloads:

« Back