Remove folder and its contents from git/GitHub's history
Solution 1
WARNING: git filter-branch is no longer officially recommended
If you are here to copy-paste code:
This is an example which removes node_modules
from history
git filter-branch --tree-filter "rm -rf node_modules" --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
echo node_modules/ >> .gitignore
git add .gitignore
git commit -m 'Removing node_modules from git history'
git gc
git push origin master --force
What git actually does:
The first line iterates through all references on the same tree (--tree-filter
) as HEAD (your current branch), running the command rm -rf node_modules
. This command deletes the node_modules folder (-r
, without -r
, rm
won't delete folders), with no prompt given to the user (-f
). The added --prune-empty
deletes useless (not changing anything) commits recursively.
The second line deletes the reference to that old branch.
The rest of the commands are relatively straightforward.
Solution 2
I find that the --tree-filter
option used in other answers can be very slow, especially on larger repositories with lots of commits.
Here is the method I use to completely remove a directory from the git history using the --index-filter
option, which runs much quicker:
# Make a fresh clone of YOUR_REPO
git clone YOUR_REPO
cd YOUR_REPO
# Create tracking branches of all branches
for remote in `git branch -r | grep -v /HEAD`; do git checkout --track $remote ; done
# Remove DIRECTORY_NAME from all commits, then remove the refs to the old commits
# (repeat these two commands for as many directories that you want to remove)
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch DIRECTORY_NAME/' --prune-empty --tag-name-filter cat -- --all
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
# Ensure all old refs are fully removed
rm -Rf .git/logs .git/refs/original
# Perform a garbage collection to remove commits with no refs
git gc --prune=all --aggressive
# Force push all branches to overwrite their history
# (use with caution!)
git push origin --all --force
git push origin --tags --force
You can check the size of the repository before and after the gc
with:
git count-objects -vH
Solution 3
It appears that the up-to-date answer to this is to not use filter-branch
directly (at least git itself does not recommend it anymore), and defer that work to an external tool. In particular, git-filter-repo is currently recommended. The author of that tool provides arguments on why using filter-branch
directly can lead to issues.
Most of the multi-line scripts above to remove dir
from the history could be re-written as:
git filter-repo --path dir --invert-paths
The tool is more powerful than just that, apparently. You can apply filters by author, email, refname and more (full manpage here). Furthermore, it is fast. Installation is easy - it is distributed in a variety of formats.
Solution 4
In addition to the popular answer above I would like to add a few notes for Windows-systems. The command
git filter-branch --tree-filter 'rm -rf node_modules' --prune-empty HEAD
works perfectly without any modification! Therefore, you must not use
Remove-Item
,del
or anything else instead ofrm -rf
.If you need to specify a path to a file or directory use slashes like
./path/to/node_modules
Solution 5
The best and most accurate method I found was to download the bfg.jar file: https://rtyley.github.io/bfg-repo-cleaner/
Then run the commands:
git clone --bare https://project/repository project-repository
cd project-repository
java -jar bfg.jar --delete-folders DIRECTORY_NAME
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push --mirror https://project/new-repository
If you want to delete files then use the delete-files option instead:
java -jar bfg.jar --delete-files *.pyc
Kartik
Updated on July 08, 2022Comments
-
Kartik almost 2 years
I was working on a repository on my GitHub account and this is a problem I stumbled upon.
- Node.js project with a folder with a few npm packages installed
- The packages were in
node_modules
folder - Added that folder to git repository and pushed the code to github (wasn't thinking about the npm part at that time)
- Realized that you don't really need that folder to be a part of the code
- Deleted that folder, pushed it
At that instance, the size of the total git repo was around 6MB where the actual code (all except that folder) was only around 300 KB.
Now what I am looking for in the end is a way to get rid of details of that package folder from git's history so if someone clones it, they don't have to download 6mb worth of history where the only actual files they will be getting as of the last commit would be 300KB.
I looked up possible solutions for this and tried these 2 methods
- Remove file from git repository (history)
- http://help.github.com/remove-sensitive-data/
- https://gist.github.com/1588371
The Gist seemed like it worked where after running the script, it showed that it got rid of that folder and after that it showed that 50 different commits were modified. But it didn't let me push that code. When I tried to push it, it said
Branch up to date
but showed 50 commits were modified upon agit status
. The other 2 methods didn't help either.Now even though it showed that it got rid of that folder's history, when I checked the size of that repo on my localhost, it was still around 6MB. (I also deleted the
refs/original
folder but didn't see the change in the size of the repo).What I am looking to clarify is, if there's a way to get rid of not only the commit history (which is the only thing I think happened) but also those files git is keeping assuming one wants to rollback.
Lets say a solution is presented for this and is applied on my localhost but cant be reproduced to that GitHub repo, is it possible to clone that repo, rollback to the first commit perform the trick and push it (or does that mean that git will still have a history of all those commits? - aka. 6MB).
My end goal here is to basically find the best way to get rid of the folder contents from git so that a user doesn't have to download 6MB worth of stuff and still possibly have the other commits that never touched the modules folder (that's pretty much all of them) in git's history.
How can I do this?
-
Davide Icardi almost 9 yearsJust a side note: I used
git count-objects -v
to check if the files was actually removed but the size of the repository remains the same until I cloned the repository again. Git mantains a copy of all the original files I think. -
knocte over 8 yearscould you explain why this is much faster?
-
Lee Netherton over 8 years@knocte: from the docs (git-scm.com/docs/git-filter-branch). "--index-filter: ... is similar to the tree filter but does not check out the tree, which makes it much faster"
-
Mad Physicist over 8 yearsWhy is this not the accepted answer? It is so thorough.
-
Petah over 8 yearsAnd how do we prevent others pushing that directory back?
-
Mohsen over 8 years@Petah adding the folder to
.gitignore
should do it -
Petah over 8 years@Mohsen But the next time the do a push it pushed the commits back as they are still in there history.
-
Griwes about 8 yearsWith a non-ancient git, this should probably read
--force-with-lease
, not--force
. -
David over 7 yearsNone of these commands work on windows. Or at least not Windows 10 please post the OS that the "cut and paste" works on
-
Kim T over 7 yearsthere is a good guide on how to do it here: help.github.com/articles/…
-
Corneliu Serediuc about 7 yearsThis will not work on Windows if the directory contains a . (dot) in the name.
-
Corneliu Serediuc about 7 yearsAnd I found the solution. Use double inverted-commas for rm command like this: "rm -rf node.modules".
-
Kris Morness almost 7 yearsIf doing this in Windows, you need double quotes instead of single quotes.
-
Kris Morness almost 7 yearsAnd if you run into an xargs problem, you'll need to add it to your path (program files\git\usr\bin).
-
DevAnimal almost 7 yearsstuck at git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d echo node_modules/ >> .gitignore
-
DevAnimal almost 7 years~\Documents\GitHub\fpo-patutu2\fpo-oms-new [master ↓2634 ↑2633]> git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d .lock': Invalid argumentUsers/viavych/Documents/GitHub/fpo-patutu2/fpo-oms-new/.git/refs/original/refs/heads/master ~\Documents\GitHub\fpo-patutu2\fpo-oms-new [master ↓2634 ↑2633]>
-
Andrej Kyselica almost 7 yearsFor Windows 10 users, this works nicely under Bash for Windows (I used Ubuntu)
-
emjay over 6 yearsvery easy :) if you want to make shure that only a specific folder is removed, this will help: stackoverflow.com/questions/21142986/…
-
Prashant Pokhriyal over 6 years
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
command is not working in ubuntu. -
Mohy Eldeen over 6 yearsI tried it with windows shell and with git bash, and did not work. First command pass, second command fail!
-
rkeet over 6 years@David This works perfectly on Windows using Git Bash. Used in Bash Terminal in PhpStorm. Bash shell installed from here
-
ctusch about 6 yearsPassing
--quiet
to thegit rm
above sped up my rewrite at least by factor 4. -
Dinuka Salwathura almost 6 yearsthis worked! not the above solution mentioned by @mohsen
-
Amit Singh almost 6 yearsI followed the same steps for bitbucket account but
git count-objects -v
still gives the old memory. Anyone knows what i am missing stackoverflow.com/questions/51761520/… -
madhu131313 over 5 yearsNot sure why this didn't work for me, but the answer below by Lee (stackoverflow.com/a/32886427/1579374) worked
-
Rafael Vega over 5 yearsThis is the only solution that worked flawlessly for me. Thanks!
-
Nikos Alexandris over 5 yearsIs this method restricted to work on the current branch only? I.e., if this method is applied from within another than the
master
branch, will it clean only the other branch and leave themaster
intact? -
sanjay patel over 5 yearsAdd double quotes for window cmd.exe. git filter-branch --tree-filter "rm -rf node_modules" --prune-empty HEAD
-
Adamski about 5 yearsWhy would you then remove
node_modules
from.gitignore
? So that they could be accidentally committed again?? -
Danny Tuppeny about 5 yearsIt doesn't get removed from gitignore, it's added to gitignore. The commit message says "git history", not "gitignore" :)
-
Bryce Meyer about 5 yearsThis answer was definitely much more useful.
-
jvriesem about 5 yearsWould love some commentary about what each of the commands do before I copy-paste them.
-
itinance about 5 yearsThe third line ("for remote ... ") leaves the current directory in the last found branch, propably not the master. a "git checkout master" is required if one won't operate on a feature branch
-
Idemax almost 5 yearsit has also generated many files
.git-rewrite
... should I commit them? -
Patrick over 4 yearsUnfortunately, it did not seem to delete a "node_modules" directory that was 6 levels deep into the git tree. Not sure why.
-
tritium_3 over 4 yearsthis answer is key. without doing this step:
for remote in
git branch -r | grep -v /HEAD; do git checkout --track $remote ; done
thegit push origin --all
will not work correctly. this is a critical step. i would recommend following this entire answer instead of the other ones. -
zavr over 4 yearsbut the comment says that you can then remove
node_modules
from.gitignore
. -
Arglanir over 4 yearsIf you have a message
Did you intend to checkout 'origin/xxxxx' which can not be resolved as commit?
after line 3, it's probably because of colors. Add --no-color togit branch -r --no-color
-
Zobayer Hasan almost 4 yearsThis worked for me, partially. I was able to clean up my local git repository. First I deleted unnecessary branches from local and remote to make the process a bit cleaner. Then I ran the git rm, cleared refs, and gc. However, it did not fix the issue for Bitbucket remote. I had to recreate the entire repo with individual branches and tags.
-
kubanczyk almost 4 yearsNice tool! Works well on Ubuntu 20.04, you can just
pip3 install git-filter-repo
since it's stdlib-only and doesn't install any dependencies. On Ubuntu 18 it's incompatible with distro's git versionError: need a version of git whose diff-tree command has the --combined-all-paths option
, but it's easy to enough to run it on adocker run -ti ubuntu:20.04
-
Tom Tang almost 4 yearsit just works, simple and elegant ! Thanks for recommendation !
-
baruchiro almost 4 yearsYou're right! But please if you can separate the answer from the information about
filter-repo
.. I mean, maybe write all the information about thefilter-repo
replacefilter-branch
, then write a-------
operator, and then give us more info about the command itself- what is the--invert-paths
for example. Thanks! -
Isaac Corbrey almost 4 yearsThis worked perfectly for me! Make sure you have
ForcePush
permissions -
RJVB over 3 yearsCan
DIR
be a list of directories and/or files, in thegit rm
expression above? Given how long the history rewrite can take it might be useful to let it do the entire rewrite at once, if you want to throw away multiple directories. -
Ed Randall over 3 yearsI recieved a warning from
git filter-branch
that it does not recommend itself:WARNING: git-filter-branch has a glut of gotchas generating mangled history rewrites. Hit Ctrl-C before proceeding to abort, then use an alternative filtering tool such as 'git filter-repo' (https://github.com/newren/git-filter-repo/) instead.
- therefore preferring thegit filter-repo
answer below: stackoverflow.com/a/61544937/482828 -
stef over 3 yearsImportant: if your directory is not on the toplevel, you have to provide the full path to it. dir/subdirectory
-
Alec Jacobson over 3 yearsIf I add back the folder then I can still see the history on github (is this a git thing or a github thing?) suppose I have sensitive information in this folder and would like to really scrub the history clean?
-
jtr13 over 3 yearsThis worked for me but it force pushed master into open PR branches and then closed them. Luckily there were only two. In short... watch out.
-
Hari Honor over 3 yearsre:
--invert-paths
, the filter is an include one. So you want to include all paths NOT matchingdir
-
JFlo over 3 yearsLike the OP, I had a large directory I could eliminate.I tried
filter-repo
first because of git's built-in warning usingfilter-branch
. However, I didn't see the expected reduction in size usinggit clone
thereafter. Usingfilter-branch
as described here did work, however. Maybe the reason lies elsewhere, like the--aggressive
option to the GC. -
Chris over 3 yearsalthough I also agree things should be understood, I think they only ought to be worked for understanding when relevant to the fellow engineer/developer's current (and personal) mission -- whatever it happens to be. So, I mean, yeah, mastering the internals of a tool written 15 years ago might yield fruit. But, I don't need to know how a hard drive works to use it, and that is kind of the point.
-
alper about 3 yearsI keep getting following warning message:
WARNING: git-filter-branch has a glut of gotchas generating mangled history rewrites.
-
gzh about 3 yearsSeems this solution does not need the working trees files, how about adding a
--mirror
option in thegit clone
command. -
gzh about 3 yearsBut using BFG may have trouble when there are several folders that have the same name as the specific one you want to delete, i.e., BFG can not accept path name for
--delete-folders
. -
alper almost 3 yearsI am having ` syntax error near unexpected token
refname'
error -
alper almost 3 years
git: 'filter-repo' is not a git command. See 'git --help'.
-
WernerCD almost 3 yearsSecond the question... "After this, you can remove the line "node_modules/" from .gitignore" This line in the answer (answer... not git commit message) says you can remove node_modules/... but why would you?
-
rococo almost 3 yearsThanks for this, this was fast and finished in seconds! A couple notes on usage: 1) you may need to install a newer version of git. If you're on ubuntu that may require setting up a new apt repository as i.e. Xenial repos are still on git 2.7.4 which is too old. 2) This DOES delete the folder locally as well. Back it up if you need it. 3) You'll need to re-add the remote url and do a force push (as always, carefully!). 4) You can install the tool with
pip3
easily (mentioned above). 5) You may need to run with--force
if you don't want to clone a fresh repo. Seems to have gone fine for me. -
Hugh Guiney almost 3 yearsThe example should read
git-filter-repo.py
, notgit filter-repo
. It is not a native Git command. -
deadalnix over 2 yearsI'm getting
error: invalid path 'DIRECTORY_NAME/FILE_IN_DIRECTORY'
errors while using this solution.