Converting an SVN repository to Git

This article is based off an old CurseForge KB article written by Ackis after I helped him through a conversion. I have since expanded the guide somewhat.

It assumes that you want to do a one-way conversion, i.e., after the conversion you do not want to use the SVN repository any more.

I have tried to expand on some sections, but the "know what you are doing" factor is still a bit high. Feel free to drop by in #git on irc.freenode.net or write an email.

Prerequisites

You will need a fairly good understanding of Git (or a large amount of blind trust in my instructions). I am not going into the differences between SVN and Git here.

You need a working Git install with git-svn (hence also the SVN libraries).

Note: You may need to use git-svn on Linux as there are reports that it is broken on Windows. YMMV. You can also try both the mingw port and a cygwin install and see if at least one works.

Outline

The brief outline is:

Generating an author map

The author map serves to map SVN usernames (which is the only identity information held in SVN) to realnames and email addresses for the git history.

It is a flat text file with lines of the form:

svnuser = R. E. Alname <real@email.example.com>

You can automate some of the task by running

svn log svn://svn.example.com/repository/path/ |
sed -ne 's/^r[^|]*| \([^ ]*\) |.*$/\1 = \1 <\1@dummy.example.com>/p' |
sort -u > author-map

which will scan the SVN history for all users, and make a dummy line for each. You can then look up their real names and emails as required.

Running the import

The basic procedure is just

git svn clone -A author-map --no-metadata -s svn://svn.example.com/repository/path/ project

or if your project is branchless (consists only of a single line of history):

git svn clone -A author-map --no-metadata svn://svn.example.com/repository/path/ project

Note the difference in the -s flag; see below.

We use --no-metadata since this is a one-way conversion; omission of this flag results in ugly git-svn-id lines.

Custom layouts

If the -s layout (trunk/branches/tags) does not fit your repository, note that git-svn can handle rather strange layouts in two ways:

Then run git svn fetch.

Fixing up history

The following are just the most common issues; there are many ways to improve history. Remember that you should not rewrite history after publishing it!

Grafting merges

Modern SVN records "merges" (we call them cherry-picks), and modern git-svn can use this data to build git merges. However, old repositories require some manual intervention.

Warning: Do not push history that has grafts; chaos may ensue. Instead, filter-branch the history first.

Suppose you have determined that commit M is a merge commit, and that it has merged history up to commit C. Then you can "fake" this merge with

echo $(git rev-parse M) $(echo git rev-parse M^) $(echo git rev-parse C) >> .git/info/grafts

The filter-branch invocations below will "set the graft in stone", but if you only want to do this step alone, you can run an otherwise-noop filter-branch with

git filter-branch --tag-name-filter cat -- --all

to achieve this.

Removing useless commits

git-svn frequently leaves commits that do not change anything, from SVN copy commands and such. You can delete them with

git filter-branch --prune-empty --tag-name-filter cat -- --all

Editing out repository prefixes

On some SVNs I have worked with, it was common practice to prefix every commit message with the project name. You can extend the filter-branch invocation from the last subsection to also edit this out for prettier history:

git filter-branch --prune-empty --tag-name-filter cat --msg-filter 'perl -pe "s/^project:\s*//"' -- --all

Changing tagging commits to tags

git-svn currently leaves a branch tags/foo for every tag. Its tip commit is usually the svn copy commit that created the tag, though obviously this does not have to be the case.

The above filter-branch commands have already deleted this copy commit since it does no changes. It remains to turn the tagging commit into a proper tag. You can use the following chunk of shell code:

git for-each-ref --format="%(refname)" refs/remotes/tags/ |
while read tag; do
    GIT_COMMITTER_DATE="$(git log -1 --pretty=format:"%ad" "$tag")" \
    GIT_COMMITTER_EMAIL="$(git log -1 --pretty=format:"%ce" "$tag")" \
    GIT_COMMITTER_NAME="$(git log -1 --pretty=format:"%cn" "$tag")" \
    git tag -m "$(git for-each-ref --format="%(contents)" "$tag")" \
        ${tag#refs/remotes/tags/} "$tag"
done

The ugliness is there to exactly preserve the committer identity, timestamp and message.