Merging: Hg/Git vs. SVN

23,740

Solution 1

I do not use Subversion myself, but from the release notes for Subversion 1.5: Merge tracking (foundational) it looks like there are the following differences from how merge tracking work in full-DAG version control systems like Git or Mercurial.

  • Merging trunk to branch is different from merging branch to trunk: for some reason merging trunk to branch requires --reintegrate option to svn merge.

    In distributed version control systems like Git or Mercurial there is no technical difference between trunk and branch: all branches are created equal (there might be social difference, though). Merging in either direction is done the same way.

  • You need to provide new -g (--use-merge-history) option to svn log and svn blame to take merge tracking into account.

    In Git and Mercurial merge tracking is automatically taken into account when displaying history (log) and blame. In Git you can request to follow first parent only with --first-parent (I guess similar option exists also for Mercurial) to "discard" merge tracking info in git log.

  • From what I understand svn:mergeinfo property stores per-path information about conflicts (Subversion is changeset-based), while in Git and Mercurial it is simply commit objects that can have more than one parent.

  • "Known Issues" subsection for merge tracking in Subversion suggests that repeated / cyclic / reflective merge might not work properly. It means that with the following histories second merge might not do the right thing ('A' can be trunk or branch, and 'B' can be branch or trunk, respectively):

    *---*---x---*---y---*---*---*---M2        <-- A
             \       \             /
              --*----M1---*---*---/           <-- B
    

    In the case the above ASCII-art gets broken: Branch 'B' is created (forked) from branch 'A' at revision 'x', then later branch 'A' is merged at revision 'y' into branch 'B' as merge 'M1', and finally branch 'B' is merged into branch 'A' as merge 'M2'.

    *---*---x---*-----M1--*---*---M2          <-- A
             \       /           / 
              \-*---y---*---*---/             <-- B
    

    In the case the above ASCII-art gets broken: Branch 'B' is created (forked) from branch 'A' at revision 'x', it is merged into branch 'A' at 'y' as 'M1', and later merged again into branch 'A' as 'M2'.

  • Subversion might not support advanced case of criss-cross merge.

    *---b-----B1--M1--*---M3
         \     \ /        /
          \     X        /
           \   / \      /
            \--B2--M2--*
    

    Git handles this situation just fine in practice using "recursive" merge strategy. I am not sure about Mercurial.

  • In "Known Issues" there is warning that merge tracking migh not work with file renames, e.g. when one side renames file (and perhaps modifies it), and second side modifies file without renaming (under old name).

    Both Git and Mercurial handle such case just fine in practice: Git using rename detection, Mercurial using rename tracking.

HTH

Solution 2

I too have been looking for a case where, say, Subversion fails to merge a branch and Mercurial (and Git, Bazaar, ...) does the right thing.

The SVN Book describes how renamed files are merged incorrectly. This applies to Subversion 1.5, 1.6, 1.7, and 1.8! I have tried to recreate the situation below:

cd /tmp
rm -rf svn-repo svn-checkout
svnadmin create svn-repo
svn checkout file:///tmp/svn-repo svn-checkout
cd svn-checkout
mkdir trunk branches
echo 'Goodbye, World!' > trunk/hello.txt
svn add trunk branches
svn commit -m 'Initial import.'
svn copy '^/trunk' '^/branches/rename' -m 'Create branch.'
svn switch '^/trunk' .
echo 'Hello, World!' > hello.txt
svn commit -m 'Update on trunk.'
svn switch '^/branches/rename' .
svn rename hello.txt hello.en.txt
svn commit -m 'Rename on branch.'
svn switch '^/trunk' .
svn merge --reintegrate '^/branches/rename'

According to the book, the merge should finish cleanly, but with wrong data in the renamed file since the update on trunk is forgotten. Instead I get a tree conflict (this is with Subversion 1.6.17, the newest version in Debian at the time of writing):

--- Merging differences between repository URLs into '.':
A    hello.en.txt
   C hello.txt
Summary of conflicts:
  Tree conflicts: 1

There shouldn't be any conflict at all — the update should be merged into the new name of the file. While Subversion fails, Mercurial handles this correctly:

rm -rf /tmp/hg-repo
hg init /tmp/hg-repo
cd /tmp/hg-repo
echo 'Goodbye, World!' > hello.txt
hg add hello.txt
hg commit -m 'Initial import.'
echo 'Hello, World!' > hello.txt
hg commit -m 'Update.'
hg update 0
hg rename hello.txt hello.en.txt
hg commit -m 'Rename.'
hg merge

Before the merge, the repository looks like this (from hg glog):

@  changeset:   2:6502899164cc
|  tag:         tip
|  parent:      0:d08bcebadd9e
|  user:        Martin Geisler 
|  date:        Thu Apr 01 12:29:19 2010 +0200
|  summary:     Rename.
|
| o  changeset:   1:9d06fa155634
|/   user:        Martin Geisler 
|    date:        Thu Apr 01 12:29:18 2010 +0200
|    summary:     Update.
|
o  changeset:   0:d08bcebadd9e
   user:        Martin Geisler 
   date:        Thu Apr 01 12:29:18 2010 +0200
   summary:     Initial import.

The output of the merge is:

merging hello.en.txt and hello.txt to hello.en.txt
0 files updated, 1 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)

In other words: Mercurial took the change from revision 1 and merged it into the new file name from revision 2 (hello.en.txt). Handling this case is of course essential in order to support refactoring and refactoring is exactly the kind of thing you will want to do on a branch.

Solution 3

Without speaking about the usual advantages (offline commits, publication process, ...) here is a "merge" example I like:

The main scenario I keep seeing is a branch on which ... two unrelated tasks are actually developed
(it started from one feature, but it lead to the development of this other feature.
Or it started from a patch, but it lead to the development of another feature).

How to you merge only one of the two feature on the main branch?
Or How do you isolate the two features in their own branches?

You could try to generate some kind of patches, the problem with that is you are not sure anymore of the functional dependencies which could have existed between:

  • the commits (or revision for SVN) used in your patches
  • the other commits not part of the patch

Git (and Mercurial too I suppose) propose the rebase --onto option to rebase (reset the root of the branch) part of a branch:

From Jefromi's post

- x - x - x (v2) - x - x - x (v2.1)
           \
            x - x - x (v2-only) - x - x - x (wss)

you can untangle this situation where you have patches for the v2 as well as a new wss feature into:

- x - x - x (v2) - x - x - x (v2.1)
          |\
          |  x - x - x (v2-only)
           \
             x - x - x (wss)

, allowing you to:

  • test each branch in isolation to check if everything compile/work as intended
  • merge only what you want to main.

The other feature I like (which influence merges) is the ability to squash commits (in a branch not yet pushed to another repo) in order to present:

  • a cleaner history
  • commits which are more coherent (instead of commit1 for function1, commit2 for function2, commit3 again for function1...)

That ensure merges which are a lot easier, with less conflicts.

Solution 4

We recently migrated from SVN to GIT, and faced this same uncertainty. There was a lot of anecdotal evidence that GIT was better, but it was hard to come across any examples.

I can tell you though, that GIT is A LOT BETTER at merging than SVN. This is obviously anecdotal, but there is a table to follow.

Here are some of the things we found:

  • SVN used to throw up a lot of tree conflicts in situations where it seemed like it shouldn't. We never got to the bottom of this but it doesn't happen in GIT.
  • While better, GIT is significantly more complicated. Spend some time on training.
  • We were used to Tortoise SVN, which we liked. Tortoise GIT is not as good and this may put you off. However I now use the GIT command line which I much prefer to Tortoise SVN or any of the GIT GUI's.

When we were evaluating GIT we ran the following tests. These show GIT as the winner when it comes to merging, but not by that much. In practice the difference is much larger, but I guess we haven't managed to replicate the situations that SVN handles badly.

GIT vs SVN Merging Evaluation

Solution 5

Others have covered the more theoretical aspects of this. Maybe I can lend a more practical perspective.

I'm currently working for a company that uses SVN in a "feature branch" development model. That is:

  • No work can be done on trunk
  • Each developer can have create their own branches
  • Branches should last for the duration of the task undertaken
  • Each task should have it's own branch
  • Merges back to trunk need to be authorized (normally via bugzilla)
  • At times when high levels of control are needed, merges may be done by a gatekeeper

In general, it works. SVN can be used for a flow like this, but it's not perfect. There are some aspects of SVN which get in the way and shape human behaviour. That gives it some negative aspects.

  • We've had quite a few problems with people branching from points lower than ^/trunk. This litters merge information records throughout the tree, and eventually breaks the merge tracking. False conflicts start appearing, and confusion reigns.
  • Picking up changes from trunk into a branch is relatively straight forward. svn merge does what you want. Merging your changes back requires (we're told) --reintegrate on the merge command. I've never truly understood this switch, but it means that the branch can't be merged into trunk again. This means it's a dead branch and you have to create a new one to continue work. (See note)
  • The whole business of doing operations on the server via URLs when creating and deleting branches really confuses and scares people. So they avoid it.
  • Switching between branches is easy to get wrong, leaving part of a tree looking at branch A, whilst leaving another part looking at branch B. So people prefer to do all their work in one branch.

What tends to happen is that an engineer creates a branch on day 1. He starts his work and forgets about it. Some time later a boss comes along and asks if he can release his work to trunk. The engineer has been dreading this day because reintegrating means:

  • Merging his long lived branch back into trunk and solving all conflicts, and releasing unrelated code that should have been in a separate branch, but wasn't.
  • Deleting his branch
  • Creating a new branch
  • Switching his working copy to the new branch

...and because the engineer does this as little as they can, they can't remember the "magic incantation" to do each step. Wrong switches and URLs happen, and suddenly they're in a mess and they go get the "expert".

Eventually it all settles down, and people learn how to deal with the shortcomings, but each new starter goes through the same problems. The eventual reality (as opposed to what I set out at he start) is:

  • No work is done on trunk
  • Each developer has one major branch
  • Branches last until work needs to be released
  • Ticketed bug fixes tend to get their own branch
  • Merges back to trunk are done when authorized

...but...

  • Sometimes work makes it to trunk when it shouldn't because it's in the same branch as something else.
  • People avoid all merging (even easy stuff), so people often work in their own little bubbles
  • Big merges tend to occur, and cause a limited amount of chaos.

Thankfully the team is small enough to cope, but it wouldn't scale. Thing is, none of this is a problem with CVCS, but more that because merges aren't as important as in DVCS they're not as slick. That "merge friction" causes behaviour which means that a "Feature Branch" model starts to break down. Good merges need to be a feature of all VCS, not just DVCS.


According to this there's now a --record-only switch that could be used to solve the --reintegrate problem, and apparently v1.8 chooses when to do a reintegrate automatically, and it doesn't cause the branch to be dead afterwards

Share:
23,740
stmax
Author by

stmax

Software engineer for fun and profit

Updated on July 13, 2022

Comments

  • stmax
    stmax almost 2 years

    I often read that Hg (and Git and...) are better at merging than SVN but I have never seen practical examples of where Hg/Git can merge something where SVN fails (or where SVN needs manual intervention). Could you post a few step-by-step lists of branch/modify/commit/...-operations that show where SVN would fail while Hg/Git happily moves on? Practical, not highly exceptional cases please...

    Some background: we have a few dozen developers working on projects using SVN, with each project (or group of similar projects) in its own repository. We know how to apply release- and feature-branches so we don't run into problems very often (i.e., we've been there, but we've learned to overcome Joel's problems of "one programmer causing trauma to the whole team" or "needing six developers for two weeks to reintegrate a branch"). We have release-branches that are very stable and only used to apply bugfixes. We have trunks that should be stable enough to be able to create a release within one week. And we have feature-branches that single developers or groups of developers can work on. Yes, they are deleted after reintegration so they don't clutter up the repository. ;)

    So I'm still trying to find the advantages of Hg/Git over SVN. I'd love to get some hands-on experience, but there aren't any bigger projects we could move to Hg/Git yet, so I'm stuck with playing with small artificial projects that only contain a few made up files. And I'm looking for a few cases where you can feel the impressive power of Hg/Git, since so far I have often read about them but failed to find them myself.

  • Anonigan
    Anonigan over 14 years
    somehow (error in Markdown parser?) the part after <pre>...</pre> block is not indented as it should be...
  • stmax
    stmax over 14 years
    i have read hginit. too bad it doesn't show more practical examples of where hg is doing better than svn.. basically it tells you to "trust joel" that hg is just better. the simple examples he's shown could probably be done with svn as well.. actually that's why i've opened this question.
  • stmax
    stmax over 14 years
    +1 for the many detailed examples. I don't yet understand why the example in the first ascii-art might cause problems. it looks like the standard way to treat feature branches: assume A is the trunk, B is a feature branch. you merge weekly from A to B and when you're done with the feature you merge everything from B to A and then delete B. that has always worked for me. have i misunderstood the diagram?
  • Anonigan
    Anonigan over 14 years
    Note that I don't know (I have not checked) that examples given above really give problems in Subversion. Renames and criss-cross merge are real problem in SVN, I think.
  • Faisal Al-Harbi
    Faisal Al-Harbi over 14 years
    +1 for detailed example one can tap into the keyboard and see for oneself what happens. As a Mercurial noob, I wonder if the hg version of this example is follows in an obvious way, line by line?
  • Faisal Al-Harbi
    Faisal Al-Harbi over 14 years
    Based on how this is said, the naive question comes to mind: what if Mercurial's merge algorithm were put into Subversion? Would svn then be as good as hg? No, because the advantage of hg is in higher level organization, not the low level text-math of merging lines from files. That's the novel idea we svn users need to grok.
  • Martin Geisler
    Martin Geisler about 14 years
    @DarenW: I've added the corresponding Mercurial commands, I hope it makes things clearer!
  • Tomislav Nakic-Alfirevic
    Tomislav Nakic-Alfirevic over 13 years
    @stmax: I can see what you mean. However, Joel's or anyone else's opinion doesn't really matter: one technology is either better than the other (for a set of use cases) or not. @DarenW and @stmax: from my own personal experience, Hg wins hands down due to it's distributed operation (I'm not connected all the time), performance (lots of local operations), extremely intuitive branching superpowered by a superior merge algorithm, hg rollback, templated log output, hg glog, single .hg folder...I could just go on, and on and on...anything other than maybe git and bazaar feels like a straightjacket.
  • gbjbaanb
    gbjbaanb over 13 years
    reintegrate merges are a special option to help you out in the most common case when merging - there is no technical difference between branches and trunk in svn either. I tend to never use it, and stick with the standard merge option. Still, the single issue with svn merge is that it treats a move/rename as a delete+addition.
  • o0'.
    o0'. over 12 years
    svn doesn't have offline commits? rofl? how can anyone even remotely consider using it if it is so?
  • Max Nanasy
    Max Nanasy over 11 years
    @Lohoris When SVN came out, there were no widely-used open-source DVCSs; at this point, I think it's mostly inertia that people still use it.
  • o0'.
    o0'. over 11 years
    @MaxNanasy a very bad kind of inertia... still, choosing it now would be simply stupid.
  • IMSoP
    IMSoP over 10 years
    @Lohoris Online (more accurately, centralised) commits are not such a big deal in a small team where the repository can simply be on a shared local server. DVCSes were mostly invented for large, geographically distributed teams (both git and mercurial were intended to manage the Linux kernel code) and open source projects (hence the popularity of GitHub). Inertia can also be seen as an evaluation of the risks vs benefits of changing a tool central to a team's workflow.
  • IMSoP
    IMSoP over 10 years
    As I understand it, the --reintegrate option tells svn that you've already resolved conflicting changes when merging onto the feature branch. Effectively, rather than treating it as a patch, it overwrites whole files with the branch version, having already checked in the merge history that all trunk revisions have been merged into the branch.
  • o0'.
    o0'. over 10 years
    @IMSoP what if the server is down? what if you want to work from home? Nonsense.
  • IMSoP
    IMSoP over 10 years
    @Lohoris For sure, it's great to be able to have a portable dev environment, but there can be much more than a VCS preventing that (e.g. databases with meaningful data, firewall rules on remote APIs, etc). I'm not disagreeing that DVCS is a better concept, though, just providing an answer to why people still feel it's worth using SVN. For a completely new project/workflow, I would probably not use SVN; for an existing setup, a business case needs to be made for how the benefits in that particular situation will outweigh the cost of a disruptive change.
  • IMSoP
    IMSoP over 10 years
    The hg comment quoted about "changesets" seems rather inaccurate to me. SVN knows perfectly well what changes it is merging (a changeset is basically the difference between two snapshots, and vice versa, right?), and can apply each of them in turn if it so wishes; certainly doesn't have to "guess" anything. If it makes "one big unholy mess" then that is an implementation problem, not anything fundamental to the design. The main problem that's hard to solve on top of the current architecture design is file move/copy/rename.
  • o0'.
    o0'. over 10 years
    @IMSoP the point about changing something running is ok, as I mentioned previously. The databases/firewall instead is exactly the kind of problem a DVCS avoids: you keep working on your local branch, you can still commit, and when you are back online/in office you can pull, merge and push.
  • IMSoP
    IMSoP over 10 years
    @Lohoris I think you misunderstood my point about DB, firewall, etc: there is little point me being able to commit on my home machine if I can't actually run that code first. I could work blind, but the fact that I can't commit things somewhere wouldn't be the main thing putting me off.
  • o0'.
    o0'. over 10 years
    @IMSoP well, in many circumstances you can have a full copy of the software, to test it on your machine. Of course, if this isn't the case, then yes, there's no point to a DVCS.
  • Paul S
    Paul S over 10 years
    @IMSoP: possibly, that makes some sense. It that doesn't explain to me why it was necessary or why it made further merges from that branch impossible though. Didn't help that the option was largely undocumented either.
  • IMSoP
    IMSoP over 10 years
    I've only ever used it via TortoiseSVN, where it was always prominently explained in the merge UI. I believe SVN 1.8 automatically chooses the right strategy, and doesn't need a separate option, but I don't know if they've fixed the normal merge algorithm to deal correctly with a branch that has been reset in this way.
  • naught101
    naught101 over 9 years
    --reintegrate is deprecated.