Migrate multiple svn repositories into single git repository

8,683

One solution would be to generate each of the repositories separately with svn2git or just git svn (it's a nice little tool already built into git), and then wire them together with git filter-branch.

  1. Clone each svn repository individually.
  2. In the repository you want to be root, add the other repositories as remotes, and fetch their branches you want to merge to that repo (you'll get warnings since the branches have no common history; that is expected).
  3. Execute git filter-branch on those new branches, using an index filter to generate a new subdirectory for them.
  4. Merge the filtered branches into master (or whatever branch you wanted) on the root repository. Full history would be preserved.

The command for step 3 would look something like this:

git filter-branch --index-filter '
    git ls-files -s |
    perl -pe "s{\t\"?}{$&newsubdir/}" |
    GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
' HEAD

The magic, and every time I have to do this it does feel a little like magic, is the perl statement. git filter-branch is filtering the index at each commit and prepending all blob paths (i.e. changing the working tree's file paths) with 'newsubdir'. You might have to experiment around to get the paths exactly right. A couple of lessons learned from someone who's walked this path before:

  • Back everything up. git filter-branch is history destructive. Once you change it, you cannot easily change it back. Be sure to back up all the repository copies you're using. Nothing's worse then finishing a complex operation and discovering you missed a / in the path.
  • Script everything. Unless you've got some serious skill; you won't get this right the first time. Script each individual step as you complete it, so that rerunning any of them is easy. Also if you discover a week later you screwed up a flag, you can replicate in moments.
  • Spend $20 on a cluster compute instance in EC2. git filter-branch is enormously CPU intensive. An index-filter on a deep history could take hours to run on your local environment, but a fraction of that time on an AWS cluster compute instance. Sure, they cost a little more than $2 an hour, but you're only going to need one for a few hours. Save yourself pain and use those scripts you wrote on hardware that makes the operation trivial. It costs the price of a nice lunch.
Share:
8,683

Related videos on Youtube

Shyru
Author by

Shyru

Updated on September 18, 2022

Comments

  • Shyru
    Shyru over 1 year

    We want to migrate from svn to git permanently to be able to use git's better features in terms of branching and collaboration.

    Our current svn repository looks like this

    svnrepo/
       frontend/
          trunk
          branches/
             ng/
             ...
          tags/
             1.x
             ...
       backend/
          trunk
          branches/
             ng/
             ...
          tags/
             1.x
             ...
    

    The working layout is that we check out the frontend project and inside this, we create a backend folder and checkout the backend project.

    We now want to migrate to git, and give up the splitting between frontend and backend (in terms of being separate projects) because it gives us more problems than advantages. We want them both to be in a single git repository.

    I wanted to use svn2git for the conversion. Unfortunatly the latest development all happened in a branch, and not in trunk, but I think this should not be a problem for svn2git. So the new git repository layout should look like this:

    /            => svnrepo/frontend/branches/ng
    /backend     => svnrepo/backend/branches/ng
    

    Where => means "migrated/converted from".

    For the conversion it is not necessary for us to convert all the tags and branches from the svn repository over to git. This is not important for us. What is important however is, that we have the full history of all commits to all files in the branches/ng directory, going back to the branching from trunk and all commits that happened in trunk before that. And we want all these commits to be with the mentioned layout in a single git repository. Is this even possible? And how would we do this?

    I already searched with google and also in stackoverflow 1,2 but could not find an exact solution for our problem.

  • Pablo López
    Pablo López over 11 years
    The problem with this is that he doesn't want separate repositories like submodules would be essentially giving.
  • Christopher
    Christopher over 11 years
    Careful with that sed call in filter-branch. Different implementations of sed treat that tab character differently. You probably want the perl modification in my answer, which was suggested on the git list.
  • Admin
    Admin over 11 years
    Exactly, we would like to use submodules only if there is no other way.
  • Christopher
    Christopher over 11 years
    Also there's no reason you couldn't remove file path specifications with git filter-branch, too. There's no need create a trash commit. Just clone 'frontend', index-filter frontend/ out of the path, and there you go. frontend/ is now the root level directory... if he needs this at all. If 'frontend' is its own repository, it should already clone at root level.
  • Pablo López
    Pablo López over 11 years
    @Christopher frontend/ never existed in the first place and it shouldn't exist.
  • Conex digital
    Conex digital over 11 years
    Thanks for the detailed answer! I will try this out, however I'm not sure if my very basic git-fu is sufficient to do this. :-) You mentioned that filter-branch is very cpu-intensive. We have a relatively small repository with currently just over 5000 commits. Would this still be very slow on a quad core cpu? (Can git filter-branch even take advantage of multiple cpu's?)
  • Christopher
    Christopher over 11 years
    5000 commits is not a very deep history at all. You're probably fine on any of your available machines. It'll still take a while, but not a crippling amount of time. For what it's worth, when this procedure is done (no matter which solution you pursue), you'll have considerably more git-fu.
  • Shyru
    Shyru over 11 years
    Thanks again for the help. I ended up using svn2git from the kde project (techbase.kde.org/Projects/MoveToGit/UsingSvn2Git) because it could deal better with our moves of files inside the svn-repository and still kept the file history intact which I did not get to work properly with the ruby svn2git. I then used git filter-branch as you proposed and it worked nicely! Thanks alot again!