Correct Usage of BFG Repo Cleaner

10,097

Solution 1

Any file still present at the HEAD of the repo will be preserved, including the history. It's to protect you from making mistakes. The idea is that you should explicitly delete the file, commit the deletion, then clean up the history to remove it.

TFS does not gc its repos; your colleague is correct. See Team Foundation Server 2015 (tfs2015) run git gc --prune=now on orgin/remote for confirmation.

Solution 2

Shortly I also used the BFG Repo Cleaner to delete some folders from an git repo at TFS.

If you want to modify also the head, use parameter --no-blob-protection

Obviously, in the cleaned (old) commits the files which you cleaned are missing. The commits are still there but the file is missing in each corresponding commit. You will not be able to see the file history.

For safety reasons I would always rename the old repo and create a new one. Probably even with another repo Name so that my co-workers can't get the wrong repo merged into their working copy.

If you really want, it is possible to git push --all -force and rewrite the complete history on the TFS repo. But then the old history is gone.

Share:
10,097
Bill Greer
Author by

Bill Greer

BY DAY: ASP.NET, IoT, C# Software Developer for MMI. BY NIGHT: I' a developer. I code day and night. FOR FUN: Fishing, tennis, hiking and coding.

Updated on June 04, 2022

Comments

  • Bill Greer
    Bill Greer almost 2 years

    The BFG Repo Cleaner site gives an example of using the tool as follows to clean up a repository:

    1. Clone a fresh copy of your repo.

      $ git clone --mirror git://example.com/some-big-repo.git
      
    2. Run BFG to clean up your repo.

      $ java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git
      
    3. Use git gc to strip out the unwanted dirty data

      $ cd some-big-repo.git
      $ git reflog expire --expire=now --all && git gc --prune=now --aggressive
      
    4. Push changes back up to the remote

      $git push
      

    I understand the head branch is protected so any file in the head branch that is larger than 100M will still be there. If I run this tool as described I will lose any history of said 100M file correct? So if there is an old version of that file in an old commit it's gone and I will not be able to use it in it's previous state....correct?

    Also, I have a coworker that stated the following and I am wondering if it is true:

    If you push back to the repository that was mirrored in TFS the changes to your pack file won't be reflected on the remote and future clones

    You have to create a new repository in TFS and push the mirror there for the remote to pick of the pack file changes.

  • Bill Greer
    Bill Greer almost 6 years
    So the TFS repo should be deleted and a new one created in such a scenario?
  • Daniel Mann
    Daniel Mann almost 6 years
    I could be wrong, but I don't think a force push would handle the case of "dead" references hanging around in the remote repo, but I think specifying --mirror would
  • milbrandt
    milbrandt almost 6 years
    According to git-scm.com/docs/git-push "Usually, the command refuses to update a remote ref that is not an ancestor of the local ref used to overwrite it [..] This flag disables these checks, and can cause the remote repository to lose commits". I'm not sure if it will work if all commits (including the very first commit) have been rewritten, but otherwise it should do. It's like rebasing an very old branch with a lot of commits.
  • Bill Greer
    Bill Greer almost 6 years
    I'm assuming I still do step 3 from my original post. Correct?