Git - get all commits and blobs they created
Solution 1
To get commits (all and output one line per commit):
git rev-list --all --pretty=oneline
Then split commits by space with limit of 2 and get every commit id and message
To get blobs created by commit (recurse to subdirs, show merge commits, detect renames and copies, don't show commit id on first line):
git diff-tree -r -c -M -C --no-commit-id <commit-sha>
A bit of parsing of every line and excluding some of them — and we get list of new blobs and they path for commit
Last is to get blob sizes:
git cat-file --batch-check < <list-of-blob-shas>
And another time a bit of parsing
Solution 2
Relying on git rev-list
is not always enough because it
List[s] commits that are reachable by following the parent links from the given commit(s) [..]
(git help rev-list
)
Thus it does not list commits that are on another branch and it does not list commits that are not reachable by any branch (perhaps they were created because of some rebase
and/or detached-head actions).
Similarly, git log
just follows the parent links from the current checked out commit. Again you don't see commits referenced by other branches or which are in a dangling state.
You can really get all commits with a command like this:
for i in `(find .git/objects -type f |
sed 's@^.*objects/\(..\)/\(.\+\)$@\1\2@' ;
git verify-pack -v .git/objects/pack/*.idx |
grep commit |
cut -f1 -d' '; ) | sort -u`
do
git log -1 --pretty=format:'%H %P %ai %s%n' $i
done
To keep it simple, the loop body prints for each commit one line containing its hash, the parent hash(es), date and subject. Note, to iterate over all commits you need to consider packed and not-yet packed objects.
You can print the referenced blobs (and only created ones) by calling git diff-tree $i
(and greping for capitial A
in the fifth column) from the loop body.
Solution 3
You can get everything but size out of the box. This one is pretty close:
git log --name-status
Solution 4
One solution based on tig's answer:
#!/usr/bin/perl
foreach my $rev (`git rev-list --all --pretty=oneline`) {
my $tot = 0;
($sha = $rev) =~ s/\s.*$//;
foreach my $blob (`git diff-tree -r -c -M -C --no-commit-id $sha`) {
$blob = (split /\s/, $blob)[3];
next if $blob == "0000000000000000000000000000000000000000"; # Deleted
my $size = `echo $blob | git cat-file --batch-check`;
$size = (split /\s/, $size)[2];
$tot += int($size);
}
print "$tot $rev" if $tot > 1000000; # Show only if > 1MiB
}
Maybe not the best code, but should get you most of the way.
Solution 5
Another useful command when searching for
git fsck --lost-found
will show dangling commits. I needed to use this to find a commit a i wiped with an ill-timed reset --hard
But don't take my word for it:
https://www.kernel.org/pub/software/scm/git/docs/git-fsck.html
tig
Updated on June 05, 2022Comments
-
tig about 2 years
Is there a git command that can output for every commit:
- id
- subject
- blobs it created with they path and size (like
git ls-tree -l -r <commit>
but only for created blobs)
-
Jed over 11 yearsYou can make this much faster by using the
--stdin
option todiff-tree
. For example,git rev-list --all | git diff-tree -r --root --diff-filter=AMC --pretty=oneline --stdin
. -
starsinmypockets about 11 yearsUmm could you comment the downvote - I'll remove the answer if it's erroneous or misleading!
-
peterflynn almost 11 yearsDoesn't work for me:
git log --walk-reflogs
gives a shorter list than just doinggit log
. Judging by the man page, the reflog is not in any way guaranteed to be an exhaustive list of all commits or all branches... -
peterflynn almost 11 yearsmaxschlepzig's answer below is important: if you're trying to do something like delete a large file from the repo history, you need to make sure to find every commit, even if some are on unmerged branches!
-
peterflynn almost 11 yearsWhat about using
git log --all
? The docs seem to imply that will include every (reachable) commit... -
maxschlepzig almost 11 years@ytpete, I explicitly mention unreachable commits in my answer.
-
pdem over 4 yearsWhat does it do?