Converting an SVN monorepo to separate Git repositories
One of the tasks I've been slowly chipping away at over the last year or more has been to slowly convert an SVN monorepo with 15-20 year's worth of history into per-project Git repositories.
One of the tasks I've been slowly chipping away at over the last year or more has been to slowly convert an SVN monorepo with 15-20 year's worth of history into per-project Git repositories.
The reasons for the transition are many, but the move was brought forward when it was announced that Phabricator would no longer be actively maintained.
Due to the structure and historical choices made in the SVN repository, the transition has been interesting to say the least so I thought it might be interesting to document some of the challenges.
Original repository structure
The original SVN repository housed many different projects (including applications, libraries, and tools). Rather than a separate repository per project, everything lived in one large repository.
The original structure looked roughly as follows:
.
└── projects
├── apps
│ └── example
│ ├── branches
│ ├── tags
│ └── trunk
│ ├── main
│ └── main.go
├── libraries
└── tools
So every project had a folder under the projects
root directory. Every project then had the standard SVN structure of a trunk
, branches
, and tags
folder.
There were some projects that lived outside of this structure from early on in the days of the repository. The first step to the transition was to move everything into this projects root folder.
Dumping the repository
The first step before starting the conversion was to dump the repository using svnadmin dump
on the server hosting the repository:
svnadmin dump -q /path/to/repository > /path/to/dump.dump
This took quite a while, given the size of the repository. In the end it was left to run overnight in a tmux session.
Extracting a single project from the dump
Now that I had a dump of the whole repository, I wanted to extract a dump for a single project at a time. Luckily, this could be done with svndumpfilter.
Say I wanted to extract the project from apps/example:
cat /path/to/dump.dump | svndumpfilter include --drop-empty-revs --renumber-revs projects/apps/example > /path/to/example.dump
For projects that had been moved into the projects folder and had prior history, any previous path(s) to the project were included in the list of include paths - for example:
cat /path/to/dump.dump | svndumpfilter include --drop-empty-revs --renumber-revs projects/apps/example some/old/path/to/example > /path/to/example.dump
Modifying the dump file
The dump file required editing to alter the history. The aim was that the resulting repository would only have the root branches, tags, and trunk folders.
The syntax of an SVN dump file and altering one is a whole topic in itself and will be explained in a follow up post.
The basic changes required for a simple project that had always existed in the root projects folder though was to simply replace the projects prefix with nothing:
sed -i 's/Node-path: projects\/apps\/example\//Node-path: /g' /path/to/example.dump
sed -i 's/Node-copyfrom-path: projects\/apps\/example\//Node-copyfrom-path: /g' /path/to/example.dump
Loading the single project dump into its own SVN repository
Now that I had a dump for a single project, the next step was to create an SVN repository containing it:
svnadmin create /path/to/repository/for/example
svnadmin load --ignore-uuid -F /path/to/example.dump /path/to/repository/for/example
Using git-svn to checkout the repository
Now that I had an SVN repository containing a single project, it was time to actually start the conversion to Git.
There are several existing guides explaining how to do this, but I found the one from Microsoft Learn to be most useful.
The steps basically boil down to:
- Create an authors mapping file o map SVN style authors to Git style authors.
- Use git svn clone to clone the repository.
The command I eventually ran looked like this:
git svn clone file://path/to/repository/for/example --authors-file=/path/to/authors.txt --no-metadata --stdlayout --prefix="" /path/to/repository/for/example-git
Convert SVN ignore to .gitignore
The old svn:ignore property needs converting to a .gitignore file:
git svn show-ignore --id=trunk > .gitignore
I did then modify the .gitignore file slightly to remove a leading blank line and a leading commented out line.
Note that this will only apply to the main branch - you'd have to run this command on any other branches you have too!
Then simply add and commit the ``.gitignore`:
git add .gitignore
git commit -m "Added gitignore"
Convert SVN branches and tags to Git branches and tags
I wanted to retain the old SVN branches and tags as proper Git branches and tags. Luckily this is pretty trivial:
for t in $(git for-each-ref --format='%(refname:short)' refs/remotes/tags); do git tag ${t/tags\//} $t && git branch -D -r $t; done
for b in $(git for-each-ref --format='%(refname:short)' refs/remotes); do git branch $b refs/remotes/$b && git branch -D -r $b; done
I also then removed any peg revisions:
for p in $(git for-each-ref --format='%(refname:short)' | grep @); do git branch -D $p; done
Optional: Removing paths from history
After completing the conversion, I wanted to remove some paths from the history for some projects (such as the obj folders for .NET projects).
The easiest way to do this is to use git-filter-repo
. It allows you to remove multiple paths like so:
git filter-repo --path FOO --path BAR --invert-paths
Conversion complete
At this point, you should have a project converted to Git with its full history. Now to push it to a remote and get to work.
And in my case, repeat the process all over again a couple of times...