An approach to managing Git or Mercurial sub-repos easily, safely, and simply, while allowing you to embed Git projects in a Mercurial repo and vice-versa.
Most software projects rely on other software projects to function. For example, Nitrogen depends on SimpleBridge, Coverize, and Mochiweb or Yaws. Riak depends on Webmachine and Mochiweb.
In the name of simplicity and ease of use, it’s generally a good idea for the parent repo to contain the source code of any sub-projects it uses.
But then you are faced with a decision:
Should the parent project include the full history of the sub-project as well?
You currently have three options, all with tradeoffs:
Remove revision history in your sub-projects by deleting the .git
/.hg
directory, and
experience pain when you want to pull the latest updates or
commit a patch on the sub-project.
Track the entire .git
/.hg
directory for each sub-project,
and accept that the .git
/.hg
directory of your parent project will now
be huge.
Use Git submodules (or Mercurial subrepos), and hope that you never have to include a Git project inside of a Mercurial project, or vice versa. Also, accept the unfortunate fact that your build process now requires a working Internet connection.
What do I really want out of sub-repo support?
I want the sub-repos to seem like part of the parent project to the user, while still seeming like distinct repositories to me.
I want to be able to work with the full history of the sub-repo, but I don’t want this included in the parent repo, or sent out to anyone who downloads the code.
I want an easy process for contributors to get the history of the sub-repos, so that they can commit patches.
Furthermore:
I only want to use tested, core features of Git or Mercurial.
I want the solution to be “cross platform”, so that I can stick Git repos in Mercurial and vice versa.
I want the solution to be simple to use and easy to understand for the most common use case. (In other words, hide complexity from the non power-users.)
After much thought and frustration, I think I’ve finally found a
solution that meets all of my needs. It lets me work with the parent
repo as I normally would, using the git
or hg
command. Furthermore, it gives me a different command to work with the
sub-repos. Finally, it is cross platform, allowing me to mix and match
Git and Mercurial projects. The only downside is that a contributor
needs to jump through an extra hoop or two in order to get the history
of a sub-repo.
In practice, the solution looks like this:
Change to the directory of a sub-project…
> cd ParentProject/SubProject1
Operate on the parent project…
> git status
...status info...
> git commit
...commit code...
Operate on the sub-project. Notice the use of the ‘subgit’ command…
> subgit status
...sub-project status info...
> subgit commit
...commit sub-project code...
Best of all, the subgit
and subhg
commands are just thin wrappers
around git
and hg
. Each is about 25 lines of shell script.
To try this out on your computer:
Save the following scripts to a location in your PATH: subgit
and subhg. (Remember to run chmod
755
to make them executable.)
Create a global excludes file for both Git and Mercurial, and add .subgit
and .subhg
to
it.
*~/.gitconfig*
[core]
excludesfile = "~/.gitignore"
*~/.gitignore*
.subgit
.subhg
*~/.hgrc*
[ui]
ignore=~/.hgignore
*~/.hgignore*
.subgit
.subhg
That’s all!
The core concept behind this approach is to store the version history of your sub-project in a non-standard directory, and then use special wrapper scripts when you want Git or Mercurial to operate against that directory.
In other words, your projects won’t have a .git
or .hg
directory.
Instead, they will have a .subgit
or .subhg
directory, which is
not tracked by the parent repo.
|--ParentProject <-- a Git repository
|
|--.git
|
|--SubProject1 <-- a Git sub-repo
| |--.subgit
|
|--SubProject2
| |--.subhg <-- a Mercurial sub-repo
|
|--src
This tricks Git or Mercurial into tracking the files inside of your sub-repo, even though the files actually belong to a different repository. (Normally Git won’t track a Git repo nested inside of another Git repo.)
The wrapper scripts–subgit
and subhg
–do the heavy lifting to
make Git or Mercurial use the .subgit
and .subhg
directories.
subgit
simply searches upward for the closest parent directory that
contains a .subgit
directory. Once found, it calls git
, telling
git to use the .subgit
directory for repository information. subhg
works the same way.
To create a sub-repo that can be managed with subgit
or subhg
:
Inside of an existing, clone a repository like normal:
git clone git://hostname.com/repository.git sub_project
Then, change to the new repository’s directory and run subgit setup
. This
simply renames .git
to .subgit
:
cd sub_project subgit setup
Now, test that it worked by viewing the parent’s Git log and the sub-repo’s git log:
git log …print out log for the parent project…
subgit log …print out log for the sub-project…
subhg
works the same way.
First, this approach is intended for all of the projects out there using GitHub, BitBucket, Google Code, etc. as their main distribution channel. Most of these projects have a small group of contributors, and a much larger group of users.
If you distribute your project via a tar’d, gzip’d file, then this blog post is not for you.
Second, in order for other contributors to submit patches to the sub-project code, they will first need to obtain the full history of the sub-project. (Which makes sense, because the whole point of this was to NOT transfer the full history during a clone.)
As far as I know, the best approach to get the history is to just pull
it from the sub-project’s remote URL into a tmp
directory:
git clone git://hostname.com/repository.git tmp
mv tmp/.git sub_project/.subgit
rm -rf tmp
-or-
hg clone http://hostname.com/repo/path/ tmp
mv tmp/.hg sub_project/.subhg
rm-rf tmp
Then, switch to the sub_project
directory and checkout the right
version. (This assumes your sub-project is in a directory named sub_project
.)
subgit
.subhg
.Content © 2006-2021 Rusty Klophaus