Git and Subversion collaboration
Git is great, we all know that, but there are use cases where there completely distributed development model does not shine (see here and here). And while my old git svn mirror of TeX Live subversion was working well,
git pull and
git svn rebase didn’t work well together, repulling the same changes again and again. Finally, I took the time to experiment and fix this!
Most of the material in this blog is already written up, and the best sources I found are here and here. There practically everything is written down, but when one goes down to business some things work out a bit differently. So here we go.
Aim of the setup is to be able that several developers can work on a git svn mirror of a central subversion repository. “Work” here means:
- pull from the git mirror to get the latest changes
- normal git workflows: branch, develop new features, push new branches to the git mirror
- commit to the subversion repository using
git svn dcommit
and all that with a much redundancy removed as possible.
On solution to this would be that each developer creates his own git-svn mirror. While this is fine in principle, it is error prone, costs lots of time, and everyone has to do
git svn rebase etc. We want to be able to use normal git workflows as far as possible.
The basic layout of our setup is as follows:
The following entities are shown in the above graphics:
- SvnRepo: the central subversion repository
- FetchingRepo: the git-svn mirror which does regular fetches and pushes to the BareRepo
- BareRepo: the central repository which is used by all developers to pull and collaborate
- DevRepo: normal git clones of the BareRepo on the developers’ computer
The flow of data is also shown in the above diagram:
- git svn fetch: the FetchingRepo is updated regularly (using cron) to fetch new revisions and new branches/tags from the SvnRepo
- git push (1): the FetchingRepo pushes changes regularly (using cron) to the BareRepo
- git pull: developers pull from the BareRepo, can check out remote branches and do normal git workflows
- git push (2): developers push changes to and creation of new branches to the BareRepo
- git svn dcommit: developers rebase-merge their changes into the main branch and commit from there to the SvnRepo
Besides the requirement to use
git svn dcommit for submitting the changes to the SvnRepo, and the requirement by git svn to have linear histories, everything else can be done with normal workflows.
Let us for the following assume that
SVNREPO points to the URI of the Subversion repository, and
BAREREPO points to the URI of the BareRepo. Furthermore, we refer to the path on the system (server, local) with variables like $BareRepo etc.
Step 1 – preparation of authors-file
To get consistent entries for committers, we need to set up a authors file, giving a mapping from Subversion users to Name/Emails:
svnuser1 = AAA BBB
svnuser2 = CCC DDD ...
Let us assume that
AUTHORSFILE environment variable points to this file.
Step 2 – creation of fetching repository
This step creates a git-svn mirror, please read the documentation for further details. If the Subversion repository follows the standard layout (trunk, branches, tags), then the following line will work:
git svn clone --prefix="" --authors-file=$AUTHORSFILE -s $SVNREPO
The important part here is the
--prefix one. The documentation of git svn says here:
Setting a prefix (with a trailing slash) is strongly encouraged in any case, as your SVN-tracking refs will then be located at “refs/remotes/$prefix/”, which is compatible with Git’s own remote-tracking ref layout (refs/remotes/$remote/). Setting a prefix is also useful if you wish to track multiple projects that share a common repository. By default, the prefix is set to origin/.
Note: Before Git v2.0, the default prefix was “” (no prefix). This meant that SVN-tracking refs were put at “refs/remotes/*”, which is incompatible with how Git’s own remote-tracking refs are organized. If you still want the old default, you can get it by passing –prefix “” on the command line.
While one might be tempted to use a prefix of “svn” or “origin”, both of which I have done, this will complicate (make impossible?) later steps, in particular the synchronization of
git pull with
git svn fetch.
The original blogs I mentioned in the beginning were written before the switch to default=”origin” was made, so this was the part that puzzled me and I didn’t understand why the old descriptions didn’t work anymore.
Step 3 – cleanup of the fetching repository
By default, git svn creates and checks out a master branch. In this case, the Subversion repositories “master” is the “trunk” branch, and we want to keep it like this. Thus, let us checkout the trunk branch and remove the master, after entering the FetchingRepo, do
cd $FetchingRepo git checkout trunk git checkout -b trunk git branch -d master
The two checkouts are necessary because the first will leave you with a detached head. In fact, no checkout would be fine, too, but git svn does not work over bare repositories, so we need to checkout some branch.
Step 4 – init the bare BareRepo
This is done in the usual way, I guess you know that:
git init --bare $BareRep
Step 5 – setup FetchingRepo to push all branches and push them
The cron job we will introduce later will fetch all new revisions, including new branches. We want to push all branches to the BareRepo. This is done by adjusting the fetch and push configuration, after changing into the FetchingRepo
cd $FetchingRepo git remote add origin $BAREREPO git config remote.origin.fetch '+refs/remotes/*:refs/remotes/origin/*' git config remote.origin.push 'refs/remotes/*:refs/heads/*' git push origin
What has been done is that fetch should update the remote branches, and push should pull the remote branches to the BareRepo. This ensures that new Subversion branches (or tags, which are nothing else then branches) are also pushed to the BareRepo.
Step 6 – adjust the default checkout branch in the BareRepo
By default the master branch is cloned/checked out in git, but we don’t have a master branch, but “trunk” plays its role. Thus, let us adjust the default in the BareRepo:
cd $BareRepo git symbolic-ref HEAD refs/heads/trunk
Step 7 – developers branch
Now we are ready to use the bare repo, and clone it onto one of the developers machine:
git clone $BAREREPO
But before we can actually use this item, we need to make sure that git commits sent to the Subversion repository have the same user name and email for the committer. The reason for this is that the commit hash is computed from various information including the name/email (see details here). Thus we need to make sure that the
git svn dcommit at the DeveloperRepo and the
git svn fetch on the FetchingRepo create the very same hash! Thus, each developer needs to set up an authorsfile with at least his own entry:
cd $DeveloperRepo echo 'mysvnuser = My Name
' > .git/usermap git config svn.authorsfile '.git/usermap'
Important: the line for mysvnuser must exactly match the one in the original authorsfile from Step 1!
The final step is to allow the developer to commit to the SvnRepo by adding the necessary information to the git configuration:
git svn init -s $SVNREPO
Warning: Here we rely on two items: First, that the
git clone initializes the default
origin for the remote name, and second, that
git svn init uses the default prefix “origin”, as discussed above.
If this is too shaky for you, the other option is to define the remote name during clone, and use that for the prefix:
git clone -o mirror $BAREREPO git svn init --prefix=mirror/ -s $SVNREPO
This way the default remote will be “mirror” and all is fine.
Note: Upon your first git svn usage in the DeveloperRepo, as well as always after a pull, you will see messages like:
Rebuilding .git/svn/refs/remotes/origin/trunk/.rev_map.c570f23f-e606-0410-a88d-b1316a301751 ... rNNNN = 1bdc669fab3d21ed7554064dc461d520222424e2 rNNNM = 2d1385fdd8b8f1eab2a95d325b0d596bd1ddb64f ...
This is a good sign, meaning that git svn does not re-fetch the whole set of revisions, but reuses the one pulled from the BareRepo and only rebuilds the mapping, which should be fast.
Updating the FetchingRepo
Updating the FetchingRepo should be done automatically using cron, the necessary steps are:
cd $FetchingRepo git svn fetch --all git push
This will fetch all revisions, and pushes the default configured branches, that are all remote heads to the BareRepo.
Note: If a Developer first commits a change to the SvnRepo using
git svn dcommit and before the FetchingRepo updated the BareRepo (i.e., before the next cron run) also uses
git pull, he will see something like:
$ git pull From preining.info:texlive2 + 10cc435f163...953f9564671 trunk -> origin/trunk (forced update) Already up to date.
This is due to the fact that the remote head is still behind the local head, which can easily be seen by looking at the output of git log: Before the FetchingRepo updated the BareRepo, one would see something like:
$ git log commit 3809fcc9aa6e0a70857cbe4985576c55317539dc (HEAD -> trunk) Author: .... commit eb19b9e6253dbc8bdc4e1774639e18753c4cd08f (origin/trunk, origin/HEAD) ...
and afterwards all of the three refs would point to the same top commit. This is nothing to worry and normal behavior. In fact, the default setup for fetching remotes is to force pull.
Protecting the trunk branch
I found myself sometimes pushing wrongly to trunk instead of using svn dcommit. This can be avoided by posing restriction on pushing. With gitolite, simply add a rule
- refs/heads/trunk = USERID
to the repo stanza of your mirror. When using Git(Lab|Hub) there are options to protect branches.
A more advanced restriction policy would be users to require that created branches are within a certain namespace. For example, a gitolite rule
repo yoursvnmirror RW+ = fetching-user RW+ dev/ = USERID R = USERID
would only allow the FetchingRepo (identified by fetching-user) to push everywhere, but myself (USERID) to push/rewind/delete etc only branches starting with “dev/”, but read everything.
Workflow for developers
The recommended workflow compatible with this setup is
git pullto update the local developers repository
- use only branches that are not created/update via git-svn
- on commit time, (1) rebase you branch on trunk, (2) merge (fast forward) your branch into trunk, (3) commit your changes with
git svn dcommit
- rinse and repeat
More detailed discussion and safety measure as laid out in the git-svn documentation apply as well, worth reading!