How to "rebase tags" in git?

git rebase git-rebase git-tag

13,683

Solution 1

In one sense, it's too late (but hang on, there's good news). The filter-branch code is able to adjust the tags because it keeps, during its filtering, a mapping of old-sha1 to new-sha1.

In fact, both filter-branch and rebase use the same basic idea, which is that each commit is copied, by expanding the original contents, making any desired changes, and then making a new commit out of the result. This means that during each copy step it's trivial to write the <old-sha1, new-sha1> pair to a file, and then once you're done, you fix up references by looking up the new-sha1 from their old-sha1. Once all the references are done, you're committed to the new numbering and you remove the mapping.

The map is gone by now, hence "in one sense, it's too late".

Luckily, it's not too late. :-) Your rebase is repeatable, or at least, the key parts of it probably are. Moreover, if your rebase was simple enough, you might not need to repeat it at all.

Let's look at the "repeat" thought. We have an original graph G of some arbitrary shape:

     o--o
    /    \
o--o--o---o--o   <-- branch-tip
 \          /
  o--o--o--o

(whoa, a flying saucer!). We've done a git rebase --root on (some part of) it, copying (some or all) commits (preserving merges or not) to get some new graph G':

    o--o--o--o   <-- branch-tip
   /
  /  o--o
 /  /    \
o--o--o---o--o
 \          /
  o--o--o--o

I've drawn this sharing only the original root node (and now it's a sailboat with a crane on it, instead of a flying saucer). There might be more sharing, or less. Some of the old nodes may have become completely unreferenced and hence been garbage-collected (probably not: the reflogs should keep all the original nodes alive for at least 30 days). But in any case, we still have tags pointing into some "old G part" of G', and those references guarantee that those nodes, and all their parents, are still in the new G'.

Thus, if we know how the original rebase was done, we can repeat it on the sub-graph of G' that is the important part of G. How hard or easy this is, and what command(s) to use to do it, depend on whether all of the original G is in G', what the rebase command was, how much G' overlays the original G, and more (since git rev-list, which is our key to getting a list of nodes, probably has no way to distinguish between "original, was-in-G" and "new to G'" nodes). But it probably can be done: it's just a Small Matter Of Programming, at this point.

If you do repeat it, this time you'd want to keep the mapping, especially if the resulting graph G'' doesn't completely overlap G', because what you need now is not the map itself, but a projection of this map, from G into G'.

We simply give each node in the original G a unique relative address (e.g., "from the tip, find parent commit #2; from that commit, find parent commit #1; from that commit...") and then find the corresponding relative address in G''. This allows us to rebuild the critical parts of the map.

Depending on the simplicity of the original rebase, we might be able to jump directly to this phase. For instance, if we know for sure that the entire graph was copied without flattening (so that we have two independent flying saucers) then the relative address for tag T in G is the relative address we want in G', and now it's trivial to use that relative address to make a new tag pointing to the copied commit.

Big update based on new information

Using the additional information that the original graph was completely linear, and that we've copied every commit, we can use a very simple strategy. We still need to reconstruct the map, but now it's easy, as every old commit has exactly one new commit, which has some linear distance (which is easy to represent as a single number) from either end of the original graph (I'll use distance-from-tip).

That is, the old graph looks like this, with just one branch:

A <- B <- C ... <- Z   <-- master

The tags simply point to one of the commits (via an annotated tag object), e.g., perhaps tag foo points to an annotated-tag object that points to commit W. We then note that W is four commits back from Z.

The new graph looks exactly the same except that each commit has been replaced with its copy. Let's call these A', B', and so on, through Z'. The (single) branch points to the tip-most commit, i.e., Z'. We'll want to adjust the original tag foo so that we have a new annotated-tag object pointing to W'.

We'll need the SHA-1 ID of the original tip-most commit. This should be easy to find in the reflog for the (single) branch, and is probably simply master@{1} (although that depends on how many times you have tweaked the branch since then; and if there are new commits you added since rebasing, we need to take those into account as well). It may well also be in the special ref ORIG_HEAD, which git rebase leaves behind in case you decide you don't like the rebase result.

Let's assume that master@{1} is the correct ID and that there are no such new commits. Then:

orig_master=$(git rev-parse master@{1})

would save this ID in $orig_master.

If we wanted to build the full map, this would do it:

$ git rev-list $orig_master > /tmp/orig_list
$ git rev-list master > /tmp/new_list
$ wc -l /tmp/orig_list /tmp/new_list

(the output for both files should be the same; if not, some assumption here has gone wrong; meanwhile I'll leave out shell $ prefix too, below, since the rest of this really should go into a script, even for one-time use, in case of typos and need for tweaks)

exec 3 < /tmp/orig_list 4 < /tmp/new_list
while read orig_id; do
    read new_id <& 4; echo $orig_id $new_id;
done <& 3 > /tmp/mapping

(this, quite untested, is meant to paste the two files together—sort of a shell version of Python zip on the two lists—to get the mapping). But we don't actually need the mapping, all we need is those "distance from tip" counts, so I'm going to pretend we didn't bother here.

Now we need to iterate over all tags:

# We don't want a pipe here because it's
# not clear what happens if we update an existing
# tag while `git for-each-ref` is still running.
git for-each-ref refs/tags > /tmp/all-tags

# it's also probably a good idea to copy these
# into a refs/original/refs/tags name space, a la
# git filter-branch.
while read sha1 objtype tagname; do
    git update-ref -m backup refs/original/$tagname $sha1
done < /tmp/all-tags

# now replace the old tags with new ones.
# it's easy to handle lightweight tags too.
while read sha1 objtype tagname; do
    case $objtype in
    tag) adj_anno_tag $sha1 $tagname;;
    commit) adj_lightweight_tag $sha1 $tagname;;
    *) echo "error: shouldn't have objtype=$objtype";;
    esac
done < /tmp/all-tags

We still need to write the two adj_anno_tag and adj_lightweight_tag shell functions. First, though, let's write a shell function that produces the new ID given the old ID, i.e., looks up the mapping. If we used a real mapping file, we would grep or awk for the first entry, then print the second. Using the sleazy single-old-file method, though, what we want is the line number of the matching ID, which we can get with grep -n:

map_sha1() {
    local grep_result line

    grep_result=$(grep -n $1 /tmp/orig_list) || {
        echo "WARNING: ID $1 is not mapped" 1>&2
        echo $1
        return 1
    }
    # annoyingly, grep produces "4:matched-text"
    # on a match.  strip off the part we don't want.
    line=${grep_result%%:*}
    # now just get git to spit out the ID of the (line - 1)'th
    # commit before the tip of the current master.  the "minus
    # one" part is because line 1 represents master~0, line 2
    # is master~1, and so on.
    git rev-parse master~$((line - 1))
}

The WARNING case should never happen, and the rev-parse should never fail, but we probably should check the return status of this shell function.

The lightweight tag updater is now pretty trivial:

adj_lightweight_tag() {
    local old_sha1=$1 new_sha1 tag=$2

    new_sha1=$(map_sha1 $old_sha1) || return
    git update-ref -m remap $tag $new_sha1 $old_sha1
}

Updating an annotated tag is more difficult, but we can steal code from git filter-branch. I'm not going to quote it all here; instead, I just give you this bit:

$ vim $(git --exec-path)/git-filter-branch

and these instructions: search for the second occurrence of git for-each-ref, and note the git cat-file piped to sed with the result passed to git mktag, which sets the shell variable new_sha1.

This is what we need to copy the tag object. The new copy must point to the object found by using $(map_sha1) on the commit to which the old tag pointed. We can find that commit the same way filter-branch does, using git rev-parse $old_sha1^{commit}.

(Incidentally, writing up this answer and looking at the filter-branch script, it occurs to me that there's a bug in filter-branch, which we'll import to our post-rebase tag-fixup code: if an existing annotated tag points to another tag, we don't fix it. We only fix lightweight tags and tags pointing directly to commits.)

Note that none of the example code above is actually tested, and turning it into a more-general-purpose script (that could be run after any rebase, for instance, or better yet, incorporated into interactive rebase itself) requires a fair amount of additional work.

Solution 2

You can use git rebasetags

You use just as you would use git rebase

git rebasetags <rebase args>

In case the rebase is interactive, you will be presented with a bash shell where you can make the changes. Upon exiting that shell, the tags will be restored.

From this post

Solution 3

Thanks to torek's detailed walk-through, I've pieced together an implementation.

#!/usr/bin/env bash
set -eo pipefail

orig_master="$(git rev-parse ORIG_HEAD)"

sane_grep () {
    GREP_OPTIONS= LC_ALL=C grep "$@"
}

map_sha1() {
    local result line

    # git rev-list $orig_master > /tmp/orig_list
    result="$(git rev-list "${orig_master}" | sane_grep -n "$1" || {
        echo "WARNING: ID $1 is not mapped" 1>&2
        return 1
    })"

    if [[ -n "${result}" ]]
    then
        # annoyingly, grep produces "4:matched-text"
        # on a match.  strip off the part we don't want.
        result=${result%%:*}
        # now just get git to spit out the ID of the (line - 1)'th
        # commit before the tip of the current master.  the "minus
        # one" part is because line 1 represents master~0, line 2
        # is master~1, and so on.
        git rev-parse master~$((result - 1))
    fi
}

adjust_lightweight_tag () {
    local old_sha1=$1 new_sha1 tag=$2

    new_sha1=$(map_sha1 "${old_sha1}")

    if [[ -n "${new_sha1}" ]]
    then
        git update-ref "${tag}" "${new_sha1}"
    fi
}

die () {
    echo "$1"
    exit 1
}

adjust_annotated_tag () {
    local sha1t=$1
    local ref=$2
    local tag="${ref#refs/tags/}"

    local sha1="$(git rev-parse -q "${sha1t}^{commit}")"
    local new_sha1="$(map_sha1 "${sha1}")"

    if [[ -n "${new_sha1}" ]]
    then
        local new_sha1=$(
            (
                printf 'object %s\ntype commit\ntag %s\n' \
                        "$new_sha1" "$tag"
                git cat-file tag "$ref" |
                sed -n \
                        -e '1,/^$/{
                    /^object /d
                    /^type /d
                    /^tag /d
                    }' \
                        -e '/^-----BEGIN PGP SIGNATURE-----/q' \
                        -e 'p'
            ) | git mktag
        ) || die "Could not create new tag object for $ref"

        if git cat-file tag "$ref" | \
                sane_grep '^-----BEGIN PGP SIGNATURE-----' >/dev/null 2>&1
        then
            echo "gpg signature stripped from tag object $sha1t"
        fi

        echo "$tag ($sha1 -> $new_sha1)"
        git update-ref "$ref" "$new_sha1"
    fi
}

git for-each-ref --format='%(objectname) %(objecttype) %(refname)' refs/tags |
while read sha1 type ref
do
    case $type in
    tag)
        adjust_annotated_tag "${sha1}" "${ref}" || true
        ;;
    commit)
        adjust_lightweight_tag "${sha1}" "${ref}" || true
        echo
        ;;
    *)
        echo "ERROR: unknown object type ${type}"
        ;;
    esac
done

13,683

Author by

Admin

Updated on June 11, 2022

Comments

Admin about 2 years

Suppose I have the following simple git repository: a single branch, some commits one after another, a couple of them having been tagged (with annotated tags) after committing each of them, and then one day I decide I want to change the first commit (which, by the way, is not tagged, if that changes anything). So I run git rebase --interactive --root and just mark 'edit' for the initial commit, change something in it and git rebase --continue. Now all commits in my repository have been recreated, therefore their sha1's have changed. However, the tags I created are completely unchanged, still pointing to the sha1 of the previous commits.

Is there an automatic way of updating the tags to the correspoiding commits created when rebasing?

Some people suggest using git filter-branch --tag-name-filter cat -- --tags but that first warns me that each of my tags are unchanged and then says that each of my tags are changed to themselves (same tag name and same commit hash). And still, git show --tags says that the tags still point to the old commits.
Admin over 8 years

Thanks for your answer! I think it is rather general, but my particular scenario is really simple. I've updated my answer to provide more details. My problem is not so much getting at the mapping of old to new sha1's -- I could even do that manually (although that would be impractical if the number of tags was large). My real problem is how to make tags point to the new commits created in the rebase process, without changing anything else in the tag (date, message, etc.). And, obviously, an automatic way of doing that would probably need to know the aforementioned mapping.
torek over 8 years

OK, it sounds like your original structure was entirely linear (no merges to worry about) and you retained all the original commits, which makes "relative addressing" trivial: the distance from old tip to tag is the same as the distance from new tip to where-tag-should-go. The main remaining issue is whether these are annotated tags, or lightweight tags.
Admin over 8 years

Yes, you nailed it. They're annotated tags. How does that matter?
torek over 8 years

Annotated tags are actual objects, so they have to be copied (or re-created) with the adjusted commit IDs. If they're signed, it's probably easiest to re-create them from scratch, rather than copying. (Then in either case there's a lightweight tag that has to be made to point somewhere, in this case, to the new annotated tag object; for plain lightweight tags, we point the lightweight tag to the rebase commit.) I'll get back to this later, I have an errand right now...
Admin over 8 years

I see. In my particular case, they're not signed. Isn't it possible to just make the existing annotated tags point to the new commit objects? (No hurries, and thanks for following up).
torek over 8 years

You can't modify the existing annotated-tag objects for the same reason that you can't modify a commit (or any other object): the identity of the object is the SHA-1 checksum of the contents of the object. This is how git automatically detects data corruption. But, if the existing annotated tags are unsigned, we can steal the code from git filter-branch that copies them, making automatic recovery much easier.
Admin over 8 years

Ok, @torek, your big update now gave me a fairly good way to achieve what I wanted. Who would've bet that it woudl be so cumbersome? Anyway, thanks for your thorough answer ;)
torek over 8 years

Yes, this sort of thing is a lot easier if you (or git) do it all at once. Rebase and filter-branch both make mappings of old-vs-new commit IDs, but then toss them out when done, and you still need it if you're doing a later tag adjustment.