Do I ever need to run git gc on a bare repo?

19,004

Solution 1

As Jefromi commented on Dan's answer, git gc should be called automatically called during "normal" use of a bare repository.

I just ran git gc --aggressive on two bare, shared repositories that have been actively used; one with about 38 commits the past 3-4 weeks, and the other with about 488 commits over roughly 3 months. Nobody has manually run git gc on either repository.

Smaller repository

$ git count-objects
333 objects, 595 kilobytes

$ git count-objects -v
count: 333
size: 595
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0

$ git gc --aggressive
Counting objects: 325, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (323/323), done.
Writing objects: 100% (325/325), done.
Total 325 (delta 209), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 8
size: 6
in-pack: 325
packs: 1
size-pack: 324
prune-packable: 0
garbage: 0

$ git count-objects
8 objects, 6 kilobytes

Larger repository

$ git count-objects
4315 objects, 11483 kilobytes

$ git count-objects -v
count: 4315
size: 11483
in-pack: 9778
packs: 20
size-pack: 15726
prune-packable: 1395
garbage: 0

$ git gc --aggressive
Counting objects: 8548, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (8468/8468), done.
Writing objects: 100% (8548/8548), done.
Total 8548 (delta 7007), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 0
size: 0
in-pack: 8548
packs: 1
size-pack: 8937
prune-packable: 0
garbage: 0

$ git count-objects
0 objects, 0 kilobytes

I wish I had thought of it before I gced these two repositories, but I should have run git gc without the --aggressive option to see the difference. Luckily I have a medium-sized active repository left to test (164 commits over nearly 2 months).

$ git count-objects -v
count: 1279
size: 1574
in-pack: 2078
packs: 6
size-pack: 2080
prune-packable: 607
garbage: 0

$ git gc
Counting objects: 1772, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1073/1073), done.
Writing objects: 100% (1772/1772), done.
Total 1772 (delta 1210), reused 1050 (delta 669)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 0
size: 0
in-pack: 1772
packs: 1
size-pack: 1092
prune-packable: 0
garbage: 0

$ git gc --aggressive
Counting objects: 1772, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1742/1742), done.
Writing objects: 100% (1772/1772), done.
Total 1772 (delta 1249), reused 0 (delta 0)

$ git count-objects -v
count: 0
size: 0
in-pack: 1772
packs: 1
size-pack: 1058
prune-packable: 0
garbage: 0

Running git gc clearly made a large dent in count-objects, even though we regularly push to and fetch from this repository. But upon reading the manpage for git config, I noticed that the default loose object limit is 6700, which we apparently had not yet reached.

So it appears that the conclusion is no, you don't need to run git gc manually on a bare repo;* but with the default setting for gc.auto, it might be a long time before garbage collection occurs automatically.


* Generally, you shouldn't need to run git gc. But sometimes you might be strapped for space and you should run git gc manually or set gc.auto to a lower value. My case for the question was simple curiosity, though.

Solution 2

From the git-gc man page:

Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

Emphasis mine. Bare repositories are repositories too!

Further explanation: one of the housekeeping tasks that git-gc performs is packing and repacking of loose objects. Even if you never have any dangling objects in your bare repository, you will -- over time -- accumulate lots of loose objects. These loose objects should periodically get packed, for efficiency. Similarly, if a large number of packs accumulate, they should periodically get repacked into larger (fewer) packs.

Solution 3

The issue with git gc --auto is that it can be blocking.

But with the new (Git 2.0 Q2 2014) setting gc.autodetach, you now can do it without any interruption:

See commit 4c4ac4d and commit 9f673f9 (Nguyễn Thái Ngọc Duy, aka pclouds):

gc --auto takes time and can block the user temporarily (but not any less annoyingly).
Make it run in background on systems that support it.
The only thing lost with running in background is printouts. But gc output is not really interesting.
You can keep it in foreground by changing gc.autodetach.


Note: only git 2.7 (Q4 2015) will make sure to not loose the error message.
See commit 329e6e8 (19 Sep 2015) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit 076c827, 15 Oct 2015)

gc: save log from daemonized gc --auto and print it next time

While commit 9f673f9 (gc: config option for running --auto in background - 2014-02-08) helps reduce some complaints about 'gc --auto' hogging the terminal, it creates another set of problems.

The latest in this set is, as the result of daemonizing, stderr is closed and all warnings are lost. This warning at the end of cmd_gc() is particularly important because it tells the user how to avoid "gc --auto" running repeatedly.
Because stderr is closed, the user does not know, naturally they complain about 'gc --auto' wasting CPU.

Daemonized gc now saves stderr to $GIT_DIR/gc.log.
Following gc --auto will not run and gc.log printed out until the user removes gc.log
.

Solution 4

Some operations run git gc --auto automatically, so there should never be the need to run git gc, git should take care of this by itself.

Contrary to what bwawok said, there actually is (or might be) a difference between your local repo and that bare one: What operations you do with it. For example dangling objects can be created by rebasing, but it may be possible that you never rebase the bare repo, so maybe you don't ever need to remove them (because there are never any). And thus you may not need to use git gc that often. But then again, like I said, git should take care of this automatically.

Share:
19,004

Related videos on Youtube

Ralph Sinsuat
Author by

Ralph Sinsuat

Go engineer, building distributed systems. +++++++++++[>+++++++[>+>+>+>+<<<<-]<-]+++++++++++[>+++>+++>++>+++<<<<-]>->++++>-->--->.<<.<.>>.<<<.>>>>+++++.<<<+++.--.>>---.<.>+++.++++.<+++++..[>]++++++[<---->-]<.<<<<.>>++++.<++.--.+.<.+->>>>[-]<<[>>+>+<<<-]>[>]+++[<--->-]<.<<-.+.<<[>+<-]>.>>--.>++++.<<<--.<<.>>>>>---.<<<.>>>-.<+.<-.>---.<<+++.>>>++.<<---.<<<.>>>>++.>--.++.<<.>>++.<<<---.>>[<<->>-]<<. The code on the old StackOverflow 404 page was brought to you by yours truly.

Updated on October 16, 2020

Comments

  • Ralph Sinsuat
    Ralph Sinsuat over 3 years

    man git-gc doesn't have an obvious answer in it, and I haven't had any luck with Google either (although I might have just been using the wrong search terms).

    I understand that you should occasionally run git gc on a local repository to prune dangling objects and compress history, among other things -- but is a shared bare repository susceptible to these same issues?

    If it matters, our workflow is multiple developers pulling from and pushing to a bare repository on a shared network drive. The "central" repository was created with git init --bare --shared.

    • VonC
      VonC about 10 years
      Note: setting gc.autodetach (Git 2.0 Q2 2014) can help running git gc --auto without bloking the user. see my answer below.
  • Ralph Sinsuat
    Ralph Sinsuat over 13 years
    +1 Thanks for clarifying one of the reasons that gc might be necessary on a bare repo.
  • Cascabel
    Cascabel over 13 years
    It's definitely true that gc needs to be run on all repos, bare or not. It's also true that enough commands run it automatically that you essentially never have to. In the case of a bare repo, it's receive-pack that invokes gc --auto. (Sometimes you may want to manually run git gc --aggressive, which will "more aggressively optimize the repository at the expense of taking much more time", but you may not find that to be important.)
  • Dan Moulding
    Dan Moulding over 13 years
    @Jefromi: I agree. The problem is that it doesn't seem to be very well documented which commands run git gc --auto. I checked the git-receive-pack man page before writing my answer, and there's no mention of it there. So for the average user, I think it's difficult to know if git gc needs to be manually run. The fact that the git gc man page still recommends that user's do run it manually seems to only add more confusion! Perhaps this is something that should be mentioned on the mailing list.
  • Cascabel
    Cascabel over 13 years
    Yeah, git's documentation unfortunately can be a bit spotty sometimes. Maybe if I get ambitious I'll submit a patch. From a quick survey of the source: merge, receive-pack, am, rebase --interactive, and svn call gc --auto directly. That's not a complete list, though, since other commands may call those.
  • Tino
    Tino about 10 years
    git gc --help also mentions option --no-prune and git prune which might come handy in bare repos, depending on the usage type