How does 'git merge' work in details?

43,128

Solution 1

You might be best off looking for a description of a 3-way merge algorithm. A high-level description would go something like this:

  1. Find a suitable merge base B - a version of the file that is an ancestor of both of the new versions (X and Y), and usually the most recent such base (although there are cases where it will have to go back further, which is one of the features of gits default recursive merge)
  2. Perform diffs of X with B and Y with B.
  3. Walk through the change blocks identified in the two diffs. If both sides introduce the same change in the same spot, accept either one; if one introduces a change and the other leaves that region alone, introduce the change in the final; if both introduce changes in a spot, but they don't match, mark a conflict to be resolved manually.

The full algorithm deals with this in a lot more detail, and even has some documentation (https://github.com/git/git/blob/master/Documentation/technical/trivial-merge.txt for one, along with the git help XXX pages, where XXX is one of merge-base, merge-file, merge, merge-one-file and possibly a few others). If that's not deep enough, there's always source code...

Solution 2

How does git perform when there are multiple common bases for merging branches?

This article was very helpful: http://codicesoftware.blogspot.com/2011/09/merge-recursive-strategy.html (here is part 2).

Recursive uses diff3 recursively to generate a virtual branch which will be used as the ancestor.

E.g.:

(A)----(B)----(C)-----(F)
        |      |       |
        |      |   +---+
        |      |   |
        |      +-------+
        |          |   |
        |      +---+   |
        |      |       |
        +-----(D)-----(E)

Then:

git checkout E
git merge F

There are 2 best common ancestors (common ancestors that are not ancestors of any other), C and D. Git merges them into a new virtual branch V, and then uses V as the base.

(A)----(B)----(C)--------(F)
        |      |          |
        |      |      +---+
        |      |      |
        |      +----------+
        |      |      |   |
        |      +--(V) |   |
        |          |  |   |
        |      +---+  |   |
        |      |      |   |
        |      +------+   |
        |      |          |
        +-----(D)--------(E)

I suppose Git would just continue with the if there were more best common ancestors, merging V with the next one.

The article says that if there is a merge conflict while generating the virtual branch Git just leaves the conflict markers where they are and continues.

What happens when I merge multiple branches at once?

As @Nevik Rehnel explained, it depends on the strategy, it is well explained on man git-merge MERGE STRATEGIES section.

Only octopus and ours / theirs support merging multiple branches at once, recursive for example does not.

octopus refuses to merge if there would be conflicts, and ours is a trivial merge so there can be no conflicts.

Those commands generate a new commit will have more than 2 parents.

I did one merge -X octopus on Git 1.8.5 without conflicts to see how it goes.

Initial state:

   +--B
   |
A--+--C
   |
   +--D

Action:

git checkout B
git merge -Xoctopus C D

New state:

   +--B--+
   |     |
A--+--C--+--E
   |     |
   +--D--+

As expected, E has 3 parents.

TODO: how exactly octopus operates on a single file modifications. Recursive two-by-two 3-way merges?

How does git perform when there is no common base for merging branches?

@Torek mentions that since 2.9, merge fails without --allow-unrelated-histories.

I tried it out empirically on on Git 1.8.5:

git init
printf 'a\nc\n' > a
git add .
git commit -m a

git checkout --orphan b
printf 'a\nb\nc\n' > a
git add .
git commit -m b
git merge master

a contains:

a
<<<<<<< ours
b
=======
>>>>>>> theirs
c

Then:

git checkout --conflict=diff3 -- .

a contains:

<<<<<<< ours
a
b
c
||||||| base
=======
a
c
>>>>>>> theirs

Interpretation:

  • the base is empty
  • when the base is empty, it is not possible to resolve any modification on a single file; only things like new file addition can be resolved. The above conflict would be solved on a 3-way merge with base a\nc\n as a single line addition
  • I think that a 3-way merge without a base file is called a 2-way merge, which is just a diff

Solution 3

I'm interested too. I don't know the answer, but...

A complex system that works is invariably found to have evolved from a simple system that worked

I think git's merging is highly sophisticated and will be very difficult to understand - but one way to approach this is from its precursors, and to focus on the heart of your concern. That is, given two files that don't have a common ancestor, how does git merge work out how to merge them, and where conflicts are?

Let's try to find some precursors. From git help merge-file:

git merge-file is designed to be a minimal clone of RCS merge; that is,
       it implements all of RCS merge's functionality which is needed by
       git(1).

From wikipedia: http://en.wikipedia.org/wiki/Git_%28software%29 -> http://en.wikipedia.org/wiki/Three-way_merge#Three-way_merge -> http://en.wikipedia.org/wiki/Diff3 -> http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf

That last link is a pdf of a paper describing the diff3 algorithm in detail. Here's a google pdf-viewer version. It's only 12 pages long, and the algorithm is only a couple of pages - but a full-on mathematical treatment. That might seem a bit too formal, but if you want to understand git's merge, you'll need to understand the simpler version first. I haven't checked yet, but with a name like diff3, you'll probably also need to understand diff (which uses a longest common subsequence algorithm). However, there may be a more intuitive explanation of diff3 out there, if you have a google...


Now, I just did an experiment comparing diff3 and git merge-file. They take the same three input files version1 oldversion version2 and mark conflicts the way same, with <<<<<<< version1, =======, >>>>>>> version2 (diff3 also has ||||||| oldversion), showing their common heritage.

I used an empty file for oldversion, and near-identical files for version1 and version2 with just one extra line added to version2.

Result: git merge-file identified the single changed line as the conflict; but diff3 treated the whole two files as a conflict. Thus, sophisticated as diff3 is, git's merge is even more sophisticated, even for this simplest of cases.

Here's the actual results (I used @twalberg's answer for the text). Note the options needed (see respective manpages).

$ git merge-file -p fun1.txt fun0.txt fun2.txt

You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:

    Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
<<<<<<< fun1.txt
=======
THIS IS A BIT DIFFERENT
>>>>>>> fun2.txt

The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...

$ diff3 -m fun1.txt fun0.txt fun2.txt

<<<<<<< fun1.txt
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:

    Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.

The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
||||||| fun0.txt
=======
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:

    Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
THIS IS A BIT DIFFERENT

The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
>>>>>>> fun2.txt

If you are truly interested in this, it's a bit of a rabbit hole. To me, it seems as deep as regular expressions, the longest common subsequence algorithm of diff, context free grammars, or relational algebra. If you want to get to the bottom of it, I think you can, but it will take some determined study.

Solution 4

How does git detect the context of a particular non-conflicting change?
How does git find out that there is a conflict in these exact lines?

If the same line has changed on both side of the merge, it's a conflict; if they haven't, the change from one side (if existent) is accepted.

Which things does git auto-merge?

Changes that do not conflict (see above)

How does git perform when there are multiple common bases for merging branches?

By the definition of a Git merge-base, there is only ever one (the latest common ancestor).

What happens when I merge multiple branches at once?

That depends on the merge strategy (only the octopus and the ours/theirs strategies support merging more than two branches).

What is a difference between merge strategies?

This is explained in the git merge manpage.

Solution 5

Here is the original implementation

http://git.kaarsemaker.net/git/blob/857f26d2f41e16170e48076758d974820af685ff/git-merge-recursive.py

Basically you create a list of common ancestors for two commits and then recursively merge them, either fast forwarding them, or creating virtual commits that get used for the basis of a three-way merge on the files.

Share:
43,128

Related videos on Youtube

abyss.7
Author by

abyss.7

Updated on November 22, 2020

Comments

  • abyss.7
    abyss.7 over 3 years

    I want to know an exact algorithm (or near that) behind 'git merge'. The answers at least to these sub-questions will be helpful:

    • How does git detect the context of a particular non-conflicting change?
    • How does git find out that there is a conflict in these exact lines?
    • Which things does git auto-merge?
    • How does git perform when there is no common base for merging branches?
    • How does git perform when there are multiple common bases for merging branches?
    • What happens when I merge multiple branches at once?
    • What is a difference between merge strategies?

    But the description of a whole algorithm will be much better.

    • Daniel Hilgarth
      Daniel Hilgarth about 11 years
      I guess you could fill a whole book with these answers...
    • Nevik Rehnel
      Nevik Rehnel about 11 years
      Or you could just go and read the code, which would take about as long as "describing the whole algorithm"
    • abyss.7
      abyss.7 about 11 years
      @DanielHilgarth I would be glad to find out, if there is already such book somewhere. References are welcome.
    • abyss.7
      abyss.7 about 11 years
      @NevikRehnel Yes, I can. But it can get much easier, if some one already knows the theory behind this code.
    • Ciro Santilli OurBigBook.com
      Ciro Santilli OurBigBook.com almost 10 years
      1. What is "the context of a particular non-conflicting change"? Points 2. and 3. are the same but negated, let's merge those two questions?
  • abyss.7
    abyss.7 about 11 years
    What does the 'same line' means? If I insert new non-empty line between two others and merge - what lines are the same? If I delete some lines in one branch, which ones are the 'same' in another branch?
  • Nevik Rehnel
    Nevik Rehnel about 11 years
    That's a bit tricky to answer in text. Git uses [diffs](en.wikipedia.org/wiki/Diff) to express the difference between two files (or two revisions of a file). It can detect if lines have been added or removed by comparing context (by default, three lines). "Same line" then means by context, while keeping additions and deletions in mind.
  • Edward Thomson
    Edward Thomson about 11 years
    You suggest that the "same line" changing would indicate a conflict. Is the automerge engine really line based? Or is it hunk-based? Is there only ever one common ancestor? If so, why does git-merge-recursive exist?
  • Nevik Rehnel
    Nevik Rehnel about 11 years
    @EdwardThomson: Yes, resolution is line-based (hunks can be broken down to smaller hunks until only one line is left). The default merge strategy uses the latest common ancestor as reference, but there are others if you want to use something else. And I don't know what git-merge-recursive should be (there is no man page and google yields nothing). More info about this can be found on the git merge and git merge-base man pages.
  • Edward Thomson
    Edward Thomson about 11 years
    The git-merge man page and the git-merge-base man pages that you point out discuss multiple common ancestors and recursive merge. I feel that your answer is incomplete without a discussion of such.
  • torek
    torek over 7 years
    There's a new SO link to this question, so I scanned through this answer (which is quite good) and noticed that a recent Git change has outdated the last section a bit. Since Git version 2.9 (commit e379fdf34fee96cd205be83ff4e71699bdc32b18), Git now refuses to merge if there is no merge base unless you add --allow-unrelated-histories.
  • adam0101
    adam0101 almost 6 years
    Here is the follow-up article from the one @Ciro posted: blog.plasticscm.com/2012/01/…
  • Jeremy List
    Jeremy List over 5 years
    Unless the behaviour has changed since I last tried it: --allow-unrelated-histories can be omitted if there are no common file paths between the branches you're merging.
  • skan
    skan over 5 years
    The problem arises when modifying different lines also produces a conflict.
  • nekketsuuu
    nekketsuuu over 5 years
    Small correction: There is ours merge strategy, but no theirs merge strategy. recursive+theirs strategy can only resolve two branches. git-scm.com/docs/git-merge#_merge_strategies
  • Chujun Song
    Chujun Song over 3 years
    the link is down.
  • philb
    philb almost 3 years
    (suggested edit queue is full) the "trivial-merge" document is viewable formatted at git-scm.com/docs/trivial-merge