How can I trigger garbage collection on a Git remote repository?

43,043

Solution 1

The remote repo should be configured to run gc as needed after a commit is made. See the documentation of gc.auto in git-gc and git-config man pages.

However, a remote repo shouldn't need all that much garbage collection, since it will rarely have dangling (unreachable) commits. These usually result from things like branch deletion and rebasing, which typically happen only in local repos.

So gc is needed more for repacking, which is for saving storage space rather than removing actual garbage. The gc.auto variable is sufficient for taking care of this.

Solution 2

after many pushes, there many files under myproj.git/objects

There won't be as much with git 2.11+ (Q4 2016) and a pre-receive hook.
In that scenario, you won't have to trigger a git gc at all.

See commit 62fe0eb, commit e34c2e0, commit 722ff7f, commit 2564d99, commit 526f108 (03 Oct 2016) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 25ab004, 17 Oct 2016)

receive-pack: quarantine objects until pre-receive accepts

In order for the receiving end of "git push" to inspect the received history and decide to reject the push, the objects sent from the sending end need to be made available to the hook and the mechanism for the connectivity check, and this was done traditionally by storing the objects in the receiving repository and letting "git gc" to expire it.

Instead, store the newly received objects in a temporary area, and make them available by reusing the alternate object store mechanism to them only while we decide if we accept the check, and once we decide, either migrate them to the repository or purge them immediately.

That temporary area will be set by the new environment variable GIT_QUARANTINE_ENVIRONMENT.

That way, if a (big) push is rejected by a pre-receive hook, those big objects won't be laying around for 90 days waiting for git gc to clean them up.

Solution 3

This question should shed some light on how often you should run garbage collection.

The easiest option would be to use a scheduled task in windows or a cron job in Unix to run git gc periodically. This way you don't even need to think about it.

Share:
43,043
peterwang
Author by

peterwang

Updated on December 05, 2020

Comments

  • peterwang
    peterwang over 3 years

    As we know, we can periodically run git gc to pack objects under .git/objects.

    In the case of a remote central Git repository (bare or not), though, after many pushes, there many files under myproj.git/objects; each commit seems to create a new file there.

    How can I pack that many files? (I mean the ones on the remote central bare repository, not on local clone repository.)

  • peterwang
    peterwang almost 14 years
    Thanks for your suggestions, but my question is how to run `git gc' on a remote bare repository, not on a local cloned repository.
  • Igor Zevaka
    Igor Zevaka almost 14 years
    Pretty sure you can't invoke git gc remotely, that's why you have to schedule it on the machine containing the bare repository.
  • Ryuu
    Ryuu about 7 years
    Not necessarily. If we pushed a bugfix branch for several people to collaborate on it, then do a rebase when merging to master, we are essentially rebasing in remote. Even if we don't perform rebasing, the remote repo should still pack things up frequently so that new folks can Clone much faster.
  • Neil Mayhew
    Neil Mayhew about 7 years
    @Ryuu, yes, you make a good point. This is one of the ways in which a remote repo could end up with loose objects. However, I did say "typically" when talking about rebasing. Maybe this scenario is becoming more common as people's use of git becomes more sophisticated. Even so, automatic garbage collection should take care of it eventually.
  • Sнаđошƒаӽ
    Sнаđошƒаӽ about 5 years
    I think loose commits also result when doing a force push to overwrite the last commit push (?), but I could be wrong. Any thoughts?
  • Neil Mayhew
    Neil Mayhew about 5 years
    @Sнаđошƒаӽ Yes, that creates loose objects. It comes under the heading of rebasing, and although that's not as common in remote repos, it still happens, especially with a pull-request workflow or any other workflow that allows users to have private branches on a remote repo that they can modify in a non fast-forward way. When I wrote this in 2010 GitHub was not as dominant as it is now.