Find the merge base between a remote and a commit

13,564

Am I going about this completely wrong?

I think so, yes. Let's rewind and review. I am sure you have seen some of this, but many explanations of this are ... not so good. (Also, you've been revising your question while I have been answering, so we might be crossing a few wires...)

Remotes, remote-tracking branches, gitrevisions, and git rev-list

A remote like origin is just a name. "A name for what" is a good question (not yours, at least not directly, but one you should ask). The main "for what" is simple enough: for a URL. The name origin stores a URL like ssh://[email protected]/path/to/repo. There is normally one URL attached to each remote. (you can set a second, different one for pushing, if you like. There are some rarely-useful cases where you can have more than one push URL as well. But mostly it's just a short name for one URL.) The key thing to remember here is not just that it's a name for a URL, but also that there is another Git at the other end of that URL. That other Git repository has its own branch and tag names.

A remote name like origin is not a branch or tag. By itself, it does not identify commits. The things that do identify commits by name are your own (local) branch names like master; any tag names you have, such as v1.3 if you have such a tag; and remote-tracking branch names like origin/master. Each such name identifies one specific commit. In the case of a remote-tracking branch name like origin/master, it identifies the commit your Git saw, the last time your Git contacted the other Git via the URL. In other words, remote-tracking branches remember what your Git saw the last time it talked to that other Git. They are only as up-to-date as your Git's most recent full conversation. (Your Git will do partial updates on successful git push operations, but will only do full updates on git fetch operations where you have not restricted the fetch. If you use git pull, it does a restricted fetch, which leaves at least some of your remote-tracking branches out of date.)

Somewhat confusingly, however, as documented in gitrevisions, the resolution path for an unqualified name X is:

  1. See if .git/X exists; if so, use that. (This makes sure that HEAD always means the HEAD commit. As a side effect, it means you shouldn't name a branch MERGE_HEAD, ORIG_HEAD, CHERRY_PICK_HEAD, or FETCH_HEAD, since various Git operations leave files with that name behind.)
  2. See if refs/X exists; if so, use that.
  3. See if refs/tags/X exists; if so, use that.
  4. See if refs/heads/X exists; if so, use that.
  5. See if refs/remotes/X exists; if so, use that.
  6. See if refs/remotes/X/HEAD exists; if so, use that.

This very last one—item 6—means that using the remote name origin by itself will, if there's a refs/remotes/origin/HEAD, try that ... and Git now normally sets up a refs/remotes/origin/HEAD indirect reference, pointing to the reference Git chose when you did your original git clone.1 This is usually refs/remotes/origin/master so that origin by itself can, but not necessarily does, "mean" origin/master, which is a remote-tracking branch name.

Meanwhile, git rev-list has its own special flags, including --remotes. To understand these in all their full Git glory, you might want to refer as well to git for-each-ref, but they relate to the same search we saw above with gitrevisisons, and the idea that the reference name-space is hierarchical.

All the remote-tracking branches for origin live within refs/remotes/origin/. That makes it easy for us to find them all. And in fact, they're constructed by taking the other Git's branches—the ones that are local branches for the Git at the other end of the URL associated with the remote—and simply renaming them. When you run git fetch origin, so that your Git has the full conversation with the other Git, your Git gets a list of all their branches, plus the specific commit IDs that go with those branches. Then your Git changes their refs/heads/X to your refs/remotes/origin/X so that it can update your remote-tracking branches.

(Add --prune to get your own Git to delete any left-over remote-tracking branches you have, that no longer have a corresponding regular branch in that other Git repository. Otherwise you may have a few ghosts left behind. You can ignore the special "remote HEAD" entirely since it's an indirect reference, i.e., it merely contains the name of another remote-tracking branch.)


1Very old versions of Git didn't do this at all, and some older versions can't do it 100% correctly, but rather just guess. Newer Gits can set the indirect branch correctly every time. For some reason, though, it's set only on initial clone. If you want to change it, you can at any time use git remote set-head, and you can set it to point to any of your remote-tracking branches, regardless of what's on the other Git.


Names that point to commits, and properties of (local) branch names

Above, I mentioned that these various names point to one specific commit. This is in fact true of all names in Git. The thing that makes branches special, different from tags for instance, is that branch names are expected to change over time, normally in an "add new commits" manner.

It's the commits themselves that point, backwards, to other (earlier / ancestor) commits. As a key side effect or implication, this means that commits are typically on many branches, and that two different names may point to the same commit.

There's one other thing that matters a lot right here, and that is that a local branch name can have one (and only one) upstream setting. If there is an upstream, it consists of two parts: the name of a remote, and the name of a branch as seen on that remote (i.e., "undoing" the remote-tracking-branch mapping). This upstream setting tells Git what you'd like to merge with or rebase upon, when you merge or rebase. By running the branch through the remote-tracking mapping (replacing master with origin/master), your Git can also tell you how far ahead and/or behind you are, of that remote-tracking branch.

(Of course, you have to run git fetch to update all the remote-tracking branches, before that last bit is useful.)

Now, I think we can get back to your question...

What commit will you merge with?

Given a remote and a commitish, I want to merge it into a target branch on origin and generate a diff for it.

I think what you are saying here is that sometimes, you just do not have enough information.

If you are given a branch name, you can get the target branch on origin:

$ git config --get branch.master.remote
origin
$ git config --get branch.master.merge
refs/heads/master

This tells us the same thing as:

$ git rev-parse --abbrev-ref master@{u}
origin/master

with the obvious difference that the former is not mapped through to the corresponding remote-tracking branch, while the latter is.

If you are not given a branch name, though, there is no way to tell what the user wants to merge with—unless, of course, you take additional arguments.

This is in fact how git merge itself works:

$ git merge          # look up the upstream; merge with that
$ git merge 1234567  # merge with commit 1234567

The merge base of either merge will be the commit(s) found by git merge-base as run on both HEAD (the current commit, by definition) and 1234567 or HEAD@{u} (the target commit, or the default of the current branch's upstream).

Edit to add note: By the way, HEAD is always the current commit. It's also usually the current branch name, because HEAD itself is usually an indirect (or symbolic) reference, naming the current branch, and the ID stored under that branch name is then the current commit. Hence if you're "on branch master", as git status would say—in other words, if you'd have * master in git branch output—then HEAD contains the string refs/heads/master, and refs/heads/master contains the commit ID. If you have a "detached HEAD", though, HEAD just has the raw commit hash in it, so that there is no current branch name. In this "detached HEAD" case, there can be no upstream either, since the upstream is attached to the branch name.

Share:
13,564
Will
Author by

Will

Updated on June 04, 2022

Comments

  • Will
    Will about 2 years

    Given a remote and a commitish, I want to merge it into a target branch on origin and generate a diff for it.

    If the target branch exists in origin, it's a simple call to merge-base to check if the remote commitish is related to origin's tree and to get a diff from the merge base to the remote commit.

    If it doesn't exist, blindly creating the new branch can be risky, as I want to make sure that the history being brought in is related to origin.

    Right now, I'm using rev-list to determine if the commit has a shared history with origin. Specifically, I'm using git rev-list --max-parents=0 --remotes=origin and git rev-list --max-parents=0 committish to get the root commit(s) for origin and the remote commit and checking that they have at least one shared root (rev-list can generate the union or difference between commits/remotes but not the intersection it seems).

    Now I know that the remote commit is related, but I still need to generate a diff. Right now, I only know what root commits in origin are reachable from the new commit. How can I find the best merge base in origin if I don't know which branch is the most closely related to the commit I want to bring in? Am I going about this completely wrong?

    In other words, given origin and commitish, how do I identify the merge base X given no knowledge of the hierarchy of origin? git merge-base origin committish is not valid since origin doesn't specify a single commit.

    o----o----o master        \
          \                    }-- everything reachable from origin
           o---X---o branch2  /
                \
    o---o----o---o----o---o commitish    }-- new data from remote