Detach many subdirectories into a new, separate Git repository

47,966

Solution 1

Answering my own question here... after a lot of trial and error.

I managed to do this using a combination of git subtree and git-stitch-repo. These instructions are based on:

First, I pulled out the directories I wanted to keep into their own separate repository:

cd origRepo
git subtree split -P apps/AAA -b aaa
git subtree split -P libs/XXX -b xxx

cd ..
mkdir aaaRepo
cd aaaRepo
git init
git fetch ../origRepo aaa
git checkout -b master FETCH_HEAD

cd ..
mkdir xxxRepo
cd xxxRepo
git init
git fetch ../origRepo xxx
git checkout -b master FETCH_HEAD

I then created a new empty repository, and imported/stitched the last two into it:

cd ..
mkdir newRepo
cd newRepo
git init
git-stitch-repo ../aaaRepo:apps/AAA ../xxxRepo:libs/XXX | git fast-import

This creates two branches, master-A and master-B, each holding the content of one of the stitched repos. To combine them and clean up:

git checkout master-A
git pull . master-B
git checkout master
git branch -d master-A 
git branch -d master-B

Now I'm not quite sure how/when this happens, but after the first checkout and the pull, the code magically merges into the master branch (any insight on what's going on here is appreciated!)

Everything seems to have worked as expected, except that if I look through the newRepo commit history, there are duplicates when the changeset affected both apps/AAA and libs/XXX. If there is a way to remove duplicates, then it would be perfect.

Solution 2

Instead of having to deal with a subshell and using ext glob (as kynan suggested), try this much simpler approach:

git filter-branch --index-filter 'git rm --cached -qr --ignore-unmatch -- . && git reset -q $GIT_COMMIT -- apps/AAA libs/XXX' --prune-empty -- --all

As mentioned by void.pointer's comment, this will remove everything except apps/AAA and libs/XXX from current repository.

Prune empty merge commits

This leaves behind lots of empty merges. These can be removed by another pass as described by raphinesse in his answer:

git filter-branch --prune-empty --parent-filter \
'sed "s/-p //g" | xargs -r git show-branch --independent | sed "s/\</-p /g"'

⚠️ Warning: The above must use GNU version of sed and xargs otherwise it would remove all commits as xargs fails. brew install gnu-sed findutils and then use gsed and gxargs:

git filter-branch --prune-empty --parent-filter \
'gsed "s/-p //g" | gxargs git show-branch --independent | gsed "s/\</-p /g"' 

Solution 3

Manual steps with simple git commands

The plan is to split individual directories into its own repos, then merge them together. The following manual steps did not employ geek-to-use scripts but easy-to-understand commands and could help merge extra N sub-folders into another single repository.

Divide

Let's assume your original repo is: original_repo

1 - Split apps:

git clone original_repo apps-repo
cd apps-repo
git filter-branch --prune-empty --subdirectory-filter apps master

2 - Split libs

git clone original_repo libs-repo
cd libs-repo
git filter-branch --prune-empty --subdirectory-filter libs master

Continue if you have more than 2 folders. Now you shall have two new and temporary git repository.

Conquer by Merging apps and libs

3 - Prepare the brand new repo:

mkdir my-desired-repo
cd my-desired-repo
git init

And you will need to make at least one commit. If the following three lines should be skipped, your first repo will appear immediate under your repo's root:

touch a_file_and_make_a_commit # see user's feedback
git add a_file_and_make_a_commit
git commit -am "at least one commit is needed for it to work"

With the temp file commited, merge command in later section will stop as expected.

Taking from user's feedback, instead of adding a random file like a_file_and_make_a_commit, you can choose to add a .gitignore, or README.md etc.

4 - Merge apps repo first:

git remote add apps-repo ../apps-repo
git fetch apps-repo
git merge -s ours --no-commit apps-repo/master # see below note.
git read-tree --prefix=apps -u apps-repo/master
git commit -m "import apps"

Now you should see apps directory inside your new repository. git log should show all relevant historical commit messages.

Note: as Chris noted below in the comments, for newer version(>=2.9) of git, you need to specify --allow-unrelated-histories with git merge

5 - Merge libs repo next in the same way:

git remote add libs-repo ../libs-repo
git fetch libs-repo
git merge -s ours --no-commit libs-repo/master # see above note.
git read-tree --prefix=libs -u libs-repo/master
git commit -m "import libs"

Continue if you have more than 2 repos to merge.

Reference: Merge a subdirectory of another repository with git

Solution 4

An easy solution: git-filter-repo

I had a similar issue and, after reviewing the various approaches listed here, I discovered git-filter-repo. It is recommended as an alternative to git-filter-branch in the official git documentation here.

To create a new repository from a subset of directories in an existing repository, you can use the command:

git filter-repo --path <file_to_keep>

Filter multiple files/folders by chaining them:

git filter-repo --path keepthisfile --path keepthisfolder/

So, to answer the original question, with git-filter-repo you would just need the following command:

git filter-repo --path apps/AAA/ --path libs/XXX/

Solution 5

Why would you want to run filter-branch more than once? You can do it all in one sweep, so no need to force it (note that you need extglob enabled in your shell for this to work):

git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch $(ls -xd apps/!(AAA) libs/!(XXX))" --prune-empty -- --all

This should get rid of all the changes in the unwanted subdirectories and keep all your branches and commits (unless they only affect files in the pruned subdirectories, by virtue of --prune-empty) - no issue with duplicate commits etc.

After this operation the unwanted directories will be listed as untracked by git status.

The $(ls ...) is necessary s.t. the extglob is evaluated by your shell instead of the index filter, which uses the sh builtin eval (where extglob is not available). See How do I enable shell options in git? for further details on that.

Share:
47,966

Related videos on Youtube

prisonerjohn
Author by

prisonerjohn

I code and animate.

Updated on November 20, 2020

Comments

  • prisonerjohn
    prisonerjohn over 3 years

    This question is based on Detach subdirectory into separate Git repository

    Instead of detaching a single subdirectory, I want to detach a couple. For example, my current directory tree looks like this:

    /apps
      /AAA
      /BBB
      /CCC
    /libs
      /XXX
      /YYY
      /ZZZ
    

    And I would like this instead:

    /apps
      /AAA
    /libs
      /XXX
    

    The --subdirectory-filter argument to git filter-branch won't work because it gets rid of everything except for the given directory the first time it's run. I thought using the --index-filter argument for all unwanted files would work (albeit tedious), but if I try running it more than once, I get the following message:

    Cannot create a new backup.
    A previous backup already exists in refs/original/
    Force overwriting the backup with -f
    

    Any ideas? TIA

  • Phil Hord
    Phil Hord over 13 years
    Neat tools you found here. Insight on "checkout": "git pull" is the same as "git fetch && git merge". The "fetch" part is innocuous since you are "fetching locally". So I think this checkout command is the same as "git merge master-B", which is a bit more self-evident. See kernel.org/pub/software/scm/git/docs/git-pull.html
  • manol
    manol over 12 years
    Interesting idea. I have a similar problem but could not get it to work, see stackoverflow.com/questions/8050687/…
  • notlesh
    notlesh over 12 years
    This is pretty much what I needed, though I had sprinkling of both files and folders across my repo... Thanks :)
  • robdodson
    robdodson over 11 years
    hm. even with extglob turned on I'm getting an error near my parenthesis: syntax error near unexpected token `(' my command looks like: git filter-branch -f --index-filter "git rm -r -f --cached --ignore-unmatch src/css/themes/!(some_theme*)" --prune-empty -- --all an ls with src/css/themes/!(some_theme*) returns all the other themes so extglob does appear to be working...
  • kynan
    kynan over 11 years
    You can get your shell to evaluate the glob as described in stackoverflow.com/a/8079852/396967
  • Aditya Sinha
    Aditya Sinha about 11 years
    Unfortunately the git-stitch-repo tool is broken due to bad dependencies nowadays.
  • Mike Graf
    Mike Graf about 11 years
    all i had to do was properly escape the characters. (eg "\!", "(", "\|", ")" .
  • kynan
    kynan about 11 years
    @MikeGraf I don't think that will give the desired result: escaping would match a literal "!" etc. in your path.
  • Mike Graf
    Mike Graf about 11 years
    Kynan you're right, it didnt actually work. It ran, but didnt give the result desired.. ("I dont understand why it doesnt work, it compiled!!" :P )
  • Admin
    Admin about 11 years
    @Henrik What problem were you experiencing exactly? It works for me, although I had to add export PERL5LIB="$PERL5LIB:/usr/local/git/lib/perl5/site_perl/" to my bash config so that it could find Git.pm. Then I installed it with cpan.
  • Mahmoud Adam
    Mahmoud Adam over 10 years
    thanks David this solution works with me unlike git stitch which duplicated each commit more than once
  • Bae
    Bae about 10 years
    On Ubuntu, I'm finding that ls -x doesn't put this on one line. I had to use $(ls -md apps/!(AAA) libs/!(XXX) | sed -e 's/, /,/g' | sed -e 's/ /\\ /g' | sed -e 's/&/\\&/g' | sed -e 's/,/ /g' | tr -d '\r\n') The -m separated the output by commas instead of the column based output I was getting. Ands seds to deal with spaces, ampersands in file names before replacing the commas with spaces and getting rid of newlines.
  • Bae
    Bae about 10 years
    Kynan pointed out (stackoverflow.com/a/8079852/396967) that ls -w <num> would get the ls back on one line.
  • Pontomedon
    Pontomedon over 9 years
    additionally, the --ignore-unmatch flag should be passed to git rm, it failed for the very first commit for me otherwise (the repository was created with git svn clone in my case)
  • umbreonben
    umbreonben about 9 years
    All I get from this command is lots of "duplicate parent" errors.
  • Yonatan
    Yonatan almost 9 years
    Assuming you have tags in the mix, you should probably add --tag-name-filter cat to your parameters
  • Alex
    Alex almost 7 years
    It does not seem to be possible to add files to the split and update them later, right?
  • Chris
    Chris over 6 years
    Since git 2.9 you need to use --allow-unrelated-histories on the merge commands. Otherwise this appears to have work well for me.
  • Shinta Smith
    Shinta Smith over 6 years
    This seems to slow to run on my repo with tons of commits
  • Jostein Kjønigsen
    Jostein Kjønigsen about 6 years
    git-split seems to use git --index filter which is extremely slow compared to --subdirectory-filter. For some repos it may still be a viable option, but for big repos (multiple gigabytes, 6-digit commits) --index-filter effectively takes weeks to run, even on dedicated cloud hardware.
  • shuttsy
    shuttsy almost 6 years
    Genius! Thank you so much for this. The initial answers I'd looked at, using a tree filter on a very large repository, had git predicting taking over 26hrs to complete the git rewrites. Much happier with this simple, but repeatable approach and have successfully moved 4 sub folders into a new repo with all expected commit history.
  • Mariusz Jamro
    Mariusz Jamro over 5 years
    Are there any downsides of doing it in two steps as described above?
  • chfw
    chfw over 5 years
    One downside is: there is an extra merge commit, because you will need to make at least one commit otherwise this approach won't work.
  • Jack Miller
    Jack Miller over 5 years
    You can use the first commit for a "Initial commit" which adds .gitignore and README.md files.
  • chfw
    chfw over 5 years
    @JackMiller, yours too.
  • Burhan Ali
    Burhan Ali over 5 years
    Could you add some more information explaining what this lengthy command is doing?
  • Dai
    Dai over 5 years
    Unfortunately this approach seems to break tracking-history for the files added in the git merge .. git read-tree step, as it records them as newly-added files and all of my git guis don't make the connection to their earlier commits.
  • Dai
    Dai over 5 years
    I'm pleasantly surprised that this works perfectly on Windows using git bash, phew!
  • jschober
    jschober over 5 years
    This works well. However, is there a way I can merge all branches (simply create new ones in the new repo?)
  • jschober
    jschober over 5 years
    Reading through all the other comments got me on the right track. However, your solution just works. It imports all branches, and works with multiple directories! Great!
  • void.pointer
    void.pointer about 5 years
    @BurhanAli For every commit in history, it is deleting all files except the ones you want to keep. When everything is done, you are left with only the portion of the tree you specified, along with only that history.
  • void.pointer
    void.pointer about 5 years
    FYI: If you use ; instead of && in the index filter command, you do not need the --ignore-unmatch option.
  • aruku7230
    aruku7230 about 5 years
    The folder name is case sensitive even on Windows, if you specify a path name that have a different case than actual case, unexpected result will occurr (that folder will be deleted instead).
  • Quolonel Questions
    Quolonel Questions almost 5 years
    If one of the paths does not exist in a given commit, I get fatal: bad revision <path> and the command aborts. To be clear, I am specifying files rather than directories.
  • Martin Delille
    Martin Delille almost 5 years
    what is the purpose of the --cached attribute?
  • retroj
    retroj over 4 years
    I was getting errors like: "/usr/lib/git-core/git-filter-branch: line 415: foo.txt: command not found" It turned out that this was caused by multi-line output from ls, and adding -w 1000 solved the problem.
  • retroj
    retroj over 4 years
    Another issue that I ran into was that I wanted to preserve files which matched the patterns: foo* and quux/foo*. The pattern !(foo*) was matching 'quux' so all of its contents were being deleted despite the second pattern. The solution was: !(foo*|quux) quux/!(foo*)
  • kynan
    kynan over 4 years
    As the name suggests, index-filter operates on the index, so you need --cached for git rm to also operate on the index (see git help rm).
  • ksadjad
    ksadjad over 4 years
    @chfw do you have any idea about how to fix newly-added files issue? It seems like in merging command somehow we lost track for relative path of the directories
  • chfw
    chfw over 4 years
    @ksadjad, No idea, to be honest. The central point of the manual merge is to select the directories to form the new repo and keep their commit histories. I am not sure how to handle such situation where a commit put files into dirA, dirB, dirDrop and only dirA and dirB are chosen for the new repo, how should the commit history relate to the original one.
  • AlanSE
    AlanSE over 4 years
    I modified this and used it for 1 folder and 2 files where it had apps/AAA libs/XXX. Worked like a charm. No idea how, but that's what makes Stack Overflow great.
  • laconbass
    laconbass over 4 years
    It's possible to use git subtree add to perform this task. See stackoverflow.com/a/58253979/1894803
  • colin
    colin over 4 years
    You should mention in your answer that the environment variable $GIT_COMMIT is to get the folders from a specific commit and if unset/empty come from HEAD
  • Jeremy Caney
    Jeremy Caney over 4 years
    If one must use git-filter-branch—and cannot use --subdirectory-filter because they want to operate against multiple folders, as asked here—then this almost certainly the best approach of the options listed both here and elsewhere. --index-filter is much faster than operating against the working tree. And, perhaps more importantly, the nested script operates exclusively off git commands, and thus should behave similarly across different operating environments.
  • Jeremy Caney
    Jeremy Caney over 4 years
    That said, it’s worth noting that this will rewrite your history, of course, but won’t update any commit references in your commit messages. So if your team uses #refs in your commit messages, and it’s important to you that those continue to work, you’ll want to be aware of that. And, of course, as with any history rewrite, you’ll probably want to push to a new repository, so you don’t end up with potential problems when other collaborators push commits from now-orphaned histories. (Alternatively, ensure that everyone on your team reclones their copy of the repository on each device.)
  • Jeremy Caney
    Jeremy Caney over 4 years
    @david-smiley’s (more recent) answer uses a very similar approach, but has the advantage of relying exclusively on git commands, and thus isn’t as susceptible to differences in how the ls is interpreted across operating systems, as @Bae discovered.
  • Jeremy Caney
    Jeremy Caney over 4 years
    The for loop is worth acknowledging, since other similar answers don’t include it. If you don’t have a local copy of each branch in your clone, then filter-branch won’t account for them as part of its rewrite, which could potentially exclude files introduced in other branches, but not yet merged with your current branch. (Though it‘s also worth doing a git fetch on any branches you have previously checked out to ensure that they remain current.)
  • Sohail Si
    Sohail Si about 4 years
    Could not delete refs/tags/v0.2.3 I receive this error.
  • jherb
    jherb almost 4 years
    I got this error message: Cannot create a new backup. A previous backup already exists in refs/original/ Force overwriting the backup with -f. So this worked: git filter-branch --prune-empty -f ...
  • Teodoro
    Teodoro almost 4 years
    This is definitely a great answer. The problem with all other solutions is that I couldn't manage to extract the contents of ALL branches of a directory. However, git filter-repo retrieved the folder from all branches and rewrote history perfectly, like cleaning the whole tree of everything I didn't need.
  • Ahmad Ismail
    Ahmad Ismail over 3 years
    I used your answer. It is a great solution. Then somehow it was lost within many open tabs. I had to literally search from my browser history to find you and say thanks.
  • CervEd
    CervEd about 3 years
    git filter-branch is terribly slow and deprecated, I recommend the git filter-repo alternative stackoverflow.com/a/61410689/1507124 for anything non trivial