Merge two Git repositories without breaking file history

116,083

Solution 1

It turns out that the answer is much simpler if you're simply trying to glue two repositories together and make it look like it was that way all along rather than manage an external dependency. You simply need to add remotes to your old repos, merge them to your new master, move the files and folders to a subdirectory, commit the move, and repeat for all additional repos. Submodules, subtree merges, and fancy rebases are intended to solve a slightly different problem and aren't suitable for what I was trying to do.

Here's an example Powershell script to glue two repositories together:

# Assume the current directory is where we want the new repository to be created
# Create the new repository
git init

# Before we do a merge, we have to have an initial commit, so we'll make a dummy commit
git commit --allow-empty -m "Initial dummy commit"

# Add a remote for and fetch the old repo
# (the '--fetch' (or '-f') option will make git immediately fetch commits to the local repo after adding the remote)
git remote add --fetch old_a <OldA repo URL>

# Merge the files from old_a/master into new/master
git merge old_a/master --allow-unrelated-histories

# Move the old_a repo files and folders into a subdirectory so they don't collide with the other repo coming later
mkdir old_a
dir -exclude old_a | %{git mv $_.Name old_a}

# Commit the move
git commit -m "Move old_a files into subdir"

# Do the same thing for old_b
git remote add -f old_b <OldB repo URL>
git merge old_b/master --allow-unrelated-histories
mkdir old_b
dir –exclude old_a,old_b | %{git mv $_.Name old_b}
git commit -m "Move old_b files into subdir"

Obviously you could instead merge old_b into old_a (which becomes the new combined repo) if you’d rather do that – modify the script to suit.

If you want to bring over in-progress feature branches as well, use this:

# Bring over a feature branch from one of the old repos
git checkout -b feature-in-progress
git merge -s recursive -Xsubtree=old_a old_a/feature-in-progress

That's the only non-obvious part of the process - that's not a subtree merge, but rather an argument to the normal recursive merge that tells Git that we renamed the target and that helps Git line everything up correctly.

I wrote up a slightly more detailed explanation here.

Solution 2

Here's a way that doesn't rewrite any history, so all commit IDs will remain valid. The end-result is that the second repo's files will end up in a subdirectory.

  1. Add the second repo as a remote:

    cd firstgitrepo/
    git remote add secondrepo username@servername:andsoon
    
  2. Make sure that you've downloaded all of the secondrepo's commits:

    git fetch secondrepo
    
  3. Create a local branch from the second repo's branch:

    git branch branchfromsecondrepo secondrepo/master
    
  4. Move all its files into a subdirectory:

    git checkout branchfromsecondrepo
    mkdir subdir/
    git ls-tree -z --name-only HEAD | xargs -0 -I {} git mv {} subdir/
    git commit -m "Moved files to subdir/"
    
  5. Merge the second branch into the first repo's master branch:

    git checkout master
    git merge --allow-unrelated-histories branchfromsecondrepo
    

Your repository will have more than one root commit, but that shouldn't pose a problem.

Solution 3

Say you want to merge repository a into b (I'm assuming they're located alongside one another):

cd b
git remote add a ../a
git fetch a
git merge --allow-unrelated-histories a/master
git remote remove a

In case you want to put a into a subdirectory do the following before the commands above:

cd a
git filter-repo --to-subdirectory-filter a
cd ..

For this you need git-filter-repo installed (filter-branch is discouraged).

An example of merging 2 big repositories, putting one of them into a subdirectory: https://gist.github.com/x-yuri/9890ab1079cf4357d6f269d073fd9731

More on it here.

Solution 4

A few years have passed and there are well-based up-voted solutions but I want to share mine because it was a bit different because I wanted to merge 2 remote repositories into a new one without deleting the history from the previous repositories.

  1. Create a new repository in Github.

    enter image description here

  2. Download the newly created repo and add the old remote repository.

    git clone https://github.com/alexbr9007/Test.git
    cd Test
    git remote add OldRepo https://github.com/alexbr9007/Django-React.git
    git remote -v
    
  3. Fetch for all the files from the old repo so a new branch gets created.

    git fetch OldRepo
    git branch -a
    

    enter image description here

  4. In the master branch, do a merge to combine the old repo with the newly created one.

    git merge remotes/OldRepo/master --allow-unrelated-histories
    

    enter image description here

  5. Create a new folder to store all the new created content that was added from the OldRepo and move its files into this new folder.

  6. Lastly, you can upload the files from the combined repos and safely delete the OldRepo from GitHub.

Hope this can be useful for anyone dealing with merging remote repositories.

Solution 5

I turned the solution from @Flimm this into a git alias like this (added to my ~/.gitconfig):

[alias]
 mergeRepo = "!mergeRepo() { \
  [ $# -ne 3 ] && echo \"Three parameters required, <remote URI> <new branch> <new dir>\" && exit 1; \
  git remote add newRepo $1; \
  git fetch newRepo; \
  git branch \"$2\" newRepo/master; \
  git checkout \"$2\"; \
  mkdir -vp \"${GIT_PREFIX}$3\"; \
  git ls-tree -z --name-only HEAD | xargs -0 -I {} git mv {} \"${GIT_PREFIX}$3\"/; \
  git commit -m \"Moved files to '${GIT_PREFIX}$3'\"; \
  git checkout master; git merge --allow-unrelated-histories --no-edit -s recursive -X no-renames \"$2\"; \
  git branch -D \"$2\"; git remote remove newRepo; \
}; \
mergeRepo"
Share:
116,083

Related videos on Youtube

Eric Lee
Author by

Eric Lee

Updated on March 31, 2022

Comments

  • Eric Lee
    Eric Lee about 2 years

    I need to merge two Git repositories into a brand new, third repository. I've found many descriptions of how to do this using a subtree merge (for example Jakub Narębski's answer on How do you merge two Git repositories?) and following those instructions mostly works, except that when I commit the subtree merge all of the files from the old repositories are recorded as new added files. I can see the commit history from the old repositories when I do git log, but if I do git log <file> it shows only one commit for that file - the subtree merge. Judging from the comments on the above answer, I'm not alone in seeing this problem but I've found no published solutions for it.

    Is there any way do merge repositories and leave individual file history intact?

    • Lucero
      Lucero over 11 years
      I'm not using Git, but in Mercurial I'd first do a convert if necessary to fix the file paths of the repos to be merged, and then force-pull one repo into the target to get the changesets, and then do a merge of the different branches. This is tested and works ;) Maybe this helps to find a solution for Git as well... compared to the subtree-merge approach I guess the convert step is different where the history is rewritten instead of just mapping a path (if I understand correctly). This then ensures a smooth merge without any special handling of file paths.
    • nacross
      nacross about 10 years
      I also found this question helpful stackoverflow.com/questions/1683531/…
    • Dimitri Dewaele
      Dimitri Dewaele about 7 years
      I created a follow-up question. Might be interesting: Merge two Git repositories and keep the master history: stackoverflow.com/questions/42161910/…
    • xverges
      xverges almost 7 years
      The automated solution that worked for me was stackoverflow.com/a/30781527/239408
  • mholm815
    mholm815 about 11 years
    this solution using git mv doesn't work so well. when you later use a git log on one of the moved files you only get the commit from the move. all previous history is lost. this is because git mv is really git rm; git add but in one step.
  • Eric Lee
    Eric Lee about 11 years
    It's the same as any other move/rename operation in Git: from the command line you can get all of the history by doing git log --follow, or all of the GUI tools do that for you automatically. With a subtree merge you can't get the history for individual files, as far as I know, so this method is better.
  • Jon
    Jon about 10 years
    @EricLee When the old_b repo is merged I get a lot of merge conflicts. Is that expected? I get CONFLICT (rename/delete)
  • Eric Lee
    Eric Lee about 10 years
    @Jon No, if you've moved the first repo into a subdirectory first then the second repo shouldn't cause any conflicts. If it does, you've probably skipped a step somewhere or have discovered some behavior I don't know about.
  • Jon
    Jon about 10 years
    I'm on Git 1.8.5 on Windows and it always causes issues when running the merge on the old_b and i'm not sure why. CONFLICT (rename/delete): DeleteFeedCommand.cs deleted in HEAD and renamed in domain/master. Version domain/master of DeleteFeedCommand.cs left in tree.
  • Jon
    Jon about 10 years
    domain/master is old_b here
  • Jon
    Jon about 10 years
    I think git add -u on the conflict seems to get around it
  • Keith
    Keith about 10 years
    Step 2 doesn't work for me: fatal: Not a valid object name: 'secondrepo/master'.
  • Flimm
    Flimm about 10 years
    @Keith: make sure you've added the second repo as a remote named "secondrepo", and that that repo has a branch named "master" (you can view branches on a remote repo with the command git remote show secondrepo)
  • sksamuel
    sksamuel about 10 years
    I had to do a fetch to bring it down as well. In between 1 and 2 I did git fetch secondrepo
  • Flimm
    Flimm about 10 years
    @monkjack: I've edited my answer to include a git fetch step. Feel free to edit the answer yourself in future.
  • DarkteK
    DarkteK about 9 years
    This isn't working, I think one step is missing or wrong, because I'm trying to merge like this tutorial, and all I got is just Unmerged error in many files, besides all my main files go into the sub directory folder....
  • IsmailS
    IsmailS over 8 years
    I was merging B repo into A repo. I had a feature-in-progress branch in both repo. I wanted to pull in-progress changes of B repo to A repo. In A repo, I did git checkout feature-in-progress > merge master > git merge -s recursive -Xsubtree=B old_a/feature-in-progress. It merged properly but the newly added files in B/feature-in-progress are missing. :(
  • George
    George over 8 years
    When I attempt "dir -exclude old_a | %{git mv $_.Name old_a}", I get sh.exe": dir: command not found and sh.exe": git: command not found. Using this works: ls -I old_a | xargs -I '{}' git mv '{}' old_a/
  • Mageician
    Mageician over 8 years
    I am using Git Bash on Windows and had the same problem as George. His command works, but it's not clear that the switches to both 'ls' and 'xargs' is a capital 'eye' and not a lower 'el'.
  • Dominique Vial
    Dominique Vial about 8 years
    This is 1 (the number One) for lsand capital 'eye' for xargs. Thanks you for this tip!
  • user3356885
    user3356885 about 8 years
    This method of combining git repositories together should be carefully evaluated if you are planning on using code from new git repository for build/release process. Looks like git move operation is messing up all branches and tags on the code in newly created repository. Although you can see your tags in place but it's not usable at all.
  • rmunn
    rmunn about 8 years
    This answer worked for me (or more specifically, for the colleague who just asked me how to do this). He didn't need to move any files to a subdirectory, since the two repos had different directory structures in them (C# project, merging the FooProject and BarProject repositories together -- everything was already under folders named FooProject or BarProject). Had a couple of easily-solved merge conflicts on files like .gitignore, but that's it. All FooProject and BarProject files are now in one repo, with full history preserved for each.
  • Jeffery Utter
    Jeffery Utter about 8 years
    I am evaluating this option and it seems to be working quite well. One issue that I am running into is that the git logs for some files contain many more commits after merging. A file that had 4 commits in it's original repo has 400 after merging into a repo with 4000 combined commits. I'm not sure where it is picking up the other commits. Any ideas?
  • Fredrik Erlandsson
    Fredrik Erlandsson almost 8 years
    For part 5, perhaps you like to use git merge branchfromsecondrepo -s recursive -X no-renames so git don't tries to find renamed files.
  • user5359531
    user5359531 almost 8 years
    After Step 4 and before Step 5, are you supposed to push branchfromsecondrepo back to its corresponding remote? Because when I went to step 5, switched back the the master branch, and then tried to git merge branchfromsecondrepo/master, the changes I made in Step 4 were no longer applied, it was as if it merged straight from the remote and not from the local branchfromsecondrepo. I was unable to run git merge branchfromsecondrepo, git gave an error about it not being something that could be merged
  • Flimm
    Flimm almost 8 years
    @user5359531 Nope, there's no need to push anything after step 4, no changes are made to the remote at all. If you like the result, you can push the result after all the steps in my answer have been completed. In Step 5, you should merge with branchfromsecondrepo, and not secondrepo/master or branchfromsecondrepo/master or any remote-tracking branch. I think you could might be confused about the differences between local branches, remote-tracking branches and branches on remotes, see this post: stackoverflow.com/a/24785777/247696
  • Mikhail Orlov
    Mikhail Orlov over 7 years
    Step 5 requires --allow-unrelated-histories.
  • Flimm
    Flimm over 7 years
    Sorry @Xiong Chiamiov, I rolled back your edit because I prefer splitting a command like git checkout -b BLA BLA2 into two commands when I'm teaching someone something. Also, your edit made the command no longer match the description of the command before it.
  • Parker Coates
    Parker Coates over 7 years
    Just curious: do you really do this often enough to need an alias?
  • Eric Smith
    Eric Smith over 7 years
    Once I've done all of those steps, can I safely delete the branch branchfromsecondrepo? Or will doing that result in loss of history?
  • Flimm
    Flimm over 7 years
    @EricSmith You can safely delete it. git branch -d branchfromsecondrepo won't delete the branch unless it's safe to do so. (The -D flag forces a deletion, but -d is safe.)
  • Fredrik Erlandsson
    Fredrik Erlandsson over 7 years
    No I don't but never remember how to do it so an alias is just a way for me to remember it.
  • EML
    EML about 7 years
    +1, but note that if you have a .gitignore in both the source repos then you will get a conflict on the second merge, which must be fixed. Note also that someone else edited your answer to add --allow-unrelated-histories to your merge command. This is new, and isn't in your blog, and should probably be reverted...
  • Matthew Wise
    Matthew Wise about 7 years
    Had to do this and your steps above worked great, thanks! Just wondering if it is safe to remove the references to the old remotes at the end and indeed whether the old repositories can be nuked, or whether the history in the merged repo still points at the old repositories?
  • quetzalcoatl
    quetzalcoatl almost 7 years
    Yeah.. but try changing computers and forgetting to move your aliases ;)
  • neowulf33
    neowulf33 almost 7 years
    What's the value of $GIT_PREFIX?
  • Fredrik Erlandsson
    Fredrik Erlandsson almost 7 years
    github.com/git/git/blob/… 'GIT_PREFIX' is set as returned by running 'git rev-parse --show-prefix' from the original current directory. See linkgit:git-rev-parse[1].
  • xverges
    xverges almost 7 years
    stackoverflow.com/a/30781527/239408 provides another automated approach, based on bash
  • Patrick Beard
    Patrick Beard almost 7 years
    I'm using zsh rather than bash, and v2.13.0 of git. No matter what I've tried, I haven't been able to get git filter-branch --index-filter to work. Typically I get an error message that the .new index file doesn't exist. Does that ring any bells?
  • Andrey Izman
    Andrey Izman almost 7 years
    @PatrickBeard I don't know zsh, you can create separated file git-add-repo.sh with function above, at the end of the file put this line git-add-repo "$@". After that you can use it from zsh like cd current/git/package and bash path/to/git-add-repo.sh https://github.com/example/example dir/to/save
  • Patrick Beard
    Patrick Beard almost 7 years
    The problem was discussed here: stackoverflow.com/questions/7798142/… mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE" fails sometimes, so you have to add an if test.
  • Devplex
    Devplex over 6 years
    Note that the --allow-unrelated-histories option to git merge in step 5 is relatively new. Git 2.7 (the default version in Ubuntu 16.04) doesn't know it so I installed Git 2.13 from the git-core PPA. The procedure works great!
  • Flimm
    Flimm over 6 years
    @MartijnHeemels For older version of Git, just omit --allow-unrelated-histories. See the history of this answer post.
  • hussfelt
    hussfelt over 6 years
    On osx/macos the dir -exclude old_a | %{git mv $_.Name old_a} could be ls | grep -v 'old_a' | xargs -I '{}' git mv '{}' old_a
  • axd
    axd over 6 years
    doesn't always work. when merging multiple projects that have the same directory layout, there will be a massive amount of conflicts due to the merge attempt of the next subproject following a git mv.
  • acumartini
    acumartini about 6 years
    In a bash context I prefer: find . -maxdepth 1 -not -name ".git" -and -not -name 'old_a' -exec git mv {} old_a/ \;
  • C S
    C S almost 6 years
    Great answer, helped me out. :) . For the last step, what is the reason to do a merge rather than a rebase? I did git rebase branchfromsecondrepo to have the 1st repo start off the history of the second repo and I would imagine that's more preferable in this type of situation.
  • Jeremy
    Jeremy over 5 years
    dir --ignore old_a not -e
  • Lucademicus
    Lucademicus over 5 years
    Step 4 does not work in Windows (in Powershell), xargs command needs to be substituted. Anyone have an idea how to rewrite Step 4 for Windows? @Flimm?
  • Flimm
    Flimm over 5 years
    @Lucademicus I saw your suggested edit. Initially I approved it, but then I rolled back, as it is not strictly equivalent to the bash equivalent. I'm not sure if it handles Git ignored files correctly.
  • Célia Doolaeghe
    Célia Doolaeghe over 4 years
    This is the only solution that worked for me to preserve git history. Don't forget to remove the remote link to old repo with git remote rm OldRepo.
  • Mason Freed
    Mason Freed about 4 years
    I would not use this method! I tried the script, naively and verbatim (I can only blame myself for that part), and it clobbered my local git repo. The history looked mostly right, but doing a git push back to Github resulted in the dreaded "RPC failed; curl 55 SSL_write() returned SYSCALL, errno = 32" error. I tried to repair it, but it was irreparably broken. I ended up having to reconstruct things in a new local repo.
  • Andrey Izman
    Andrey Izman about 4 years
    @MasonFreed this script creates a new git history with mix of both repos, so it can't be pushed to old repo, it require to create a new one or push with force key, means it rewrite your repo on server
  • Peter
    Peter about 4 years
    As mholm815 said, after merging the repos simply using git log <file> will stop on the moving commit. Eric Lee suggested using git log --follow <file> but it didn't work for me (git 2.25). Instead, it showed no commits at all. To get full history for a particular file I use git log -m --follow <file>.
  • Yuri L
    Yuri L about 4 years
    Step 3 failed for me with syntax error. Semi-colons are missing. Fix git filter-branch --prune-empty --tree-filter ' if [ ! -e my/new/subdir ]; then mkdir -p my/new/subdir; git ls-tree --name-only $GIT_COMMIT | xargs -I files mv files my/new/subdir; fi'
  • Stephen Turner
    Stephen Turner almost 4 years
    Git documentation recommends not rebasing... git-scm.com/book/en/v2/Git-Branching-Rebasing#_rebase_peril
  • code4meow
    code4meow over 3 years
    I can't upvote this enough. A perfectly simple, successful, sensible solution. Thank you! And thank you @Harubiyori for the final touch.
  • Bob
    Bob over 3 years
    Is it possible to do it without merge conflicts?
  • x-yuri
    x-yuri over 3 years
    @Mikhail Yes, it is possible, do you see merge conflicts in the gist? If you run into merge conflicts, that means you have e.g. file a/b/c in both repositories. Either rename files before merge, or merge into a subdirectory, or resolve the conflicts.
  • Bob
    Bob over 3 years
    ok. thank you. resolve the conflicts it is
  • Robert Massaioli
    Robert Massaioli over 2 years
    To make the move easier, I would just mkdir -p old_a && git mv -k * old_a. Much simpler.
  • Tim MB
    Tim MB over 2 years
    This worked great, except I'm missing the submodules now
  • Ishmaeel
    Ishmaeel over 2 years
    this is the perfect solution for preserving file histories without becoming dependent on --follow, thank you!
  • Sta_Doc
    Sta_Doc over 2 years
    This solution works for me. All the previous commit messages from the other repo are kept!!
  • Astariul
    Astariul about 2 years
    Following the comment of @RobertMassaioli, if you need to move hidden files/folders as well, you can do mkdir -p old_a && git mv -k .[!.]* old_a
  • erik-stengel
    erik-stengel about 2 years
    Do I have to execute this in PowerShell or in GitBash?