When to use git subtree?

42,827

Solution 1

You should be careful to note explicitly what you are talking about when you use the term 'subtree' in the context of git as there are actually two separate but related topics here:

git-subtree and git subtree merge strategy.

The TL;DR

Both subtree related concepts effectively allow you to manage multiple repositories in one. In contrast to git-submodule where only metadata is stored in the root repository, in the form of .gitmodules, and you must manage the external repositories separately.

More Details

git subtree merge strategy is basically the more manual method using the commands you referenced.

git-subtree is a wrapper shell script to facilitate a more natural syntax. This is actually still a part of contrib and not fully integrated into git with the usual man pages. The documentation is instead stored along side the script.

Here is the usage info:

NAME
----
git-subtree - Merge subtrees together and split repository into subtrees


SYNOPSIS
--------
[verse]
'git subtree' add   -P <prefix> <commit>
'git subtree' add   -P <prefix> <repository> <ref>
'git subtree' pull  -P <prefix> <repository> <ref>
'git subtree' push  -P <prefix> <repository> <ref>
'git subtree' merge -P <prefix> <commit>
'git subtree' split -P <prefix> [OPTIONS] [<commit>]

I have come across a pretty good number of resources on the subject of subtrees, as I was planning on writing a blog post of my own. I will update this post if I do, but for now here is some relevant information to the question at hand:

Much of what you are seeking can be found on this Atlassian blog by Nicola Paolucci the relevant section below:

Why use subtree instead of submodule?

There are several reasons why you might find subtree better to use:

  • Management of a simple workflow is easy.
  • Older version of git are supported (even before v1.5.2).
  • The sub-project’s code is available right after the clone of the super project is done.
  • subtree does not require users of your repository to learn anything new, they can ignore the fact that you are using subtree to manage dependencies.
  • subtree does not add new metadata files like submodules does (i.e. .gitmodule).
  • Contents of the module can be modified without having a separate repository copy of the dependency somewhere else.

In my opinion the drawbacks are acceptable:

  • You must learn about a new merge strategy (i.e. subtree).
  • Contributing code back upstream for the sub-projects is slightly more complicated.
  • The responsibility of not mixing super and sub-project code in commits lies with you.

I would agree with much of this as well. I would recommend checking out the article as it goes over some common usage.

You may have noticed that he has also written a follow up here where he mentions an important detail that is left off with this approach...

git-subtree currently fails to include the remote!

This short sightedness is probably due to the fact that people often add a remote manually when dealing with subtrees, but this isn't stored in git either. The author details a patch he has written to add this meta data to the commit that git-subtree already generates. Until this makes it into the official git mainline you could do something similar by modifying the commit message or storing it in another commit.

I also find this blog post very informative as well. The author adds a third subtree method he calls git-streeto the mix. The article is worth a read as he does a pretty good job of comparing the three approaches. He gives his personal opinion of what he does and doesn't like and explains why he created the third approach.

Extras

Closing Thoughts

This topic shows both the power of git and the segmentation that can occur when a feature just misses the mark.

I personally have built a distaste for git-submodule as I find it more confusing for contributors to understand. I also prefer to keep ALL of my dependencies managed within my projects to facilitate an easily reproducible environment without trying to manage multiple repositories. git-submodule, however, is much more well known currently so it is obviously good to be aware of it and depending on your audience that may sway your decision.

Solution 2

First of: I believe your question tends to get strongly opinionated answers and may be considered off-topic here. However I don't like that SO policy and would push the border of being on-topic a bit outward, so I like to answer instead and hope others do as well.

On the GitHub tutorial that you pointed to there's a link to How to use the subtree merge strategy which gives a viewpoint on advantages/disadvantages:

Comparing subtree merge with submodules

The benefit of using subtree merge is that it requires less administrative burden from the users of your repository. It works with older (before Git v1.5.2) clients and you have the code right after clone.

However if you use submodules then you can choose not to transfer the submodule objects. This may be a problem with the subtree merge.

Also, in case you make changes to the other project, it is easier to submit changes if you just use submodules.

Here's my viewpoint based on the above:

I often work with folks (=committers) who are no regular git users, some still (and will forever) struggle with version control. Educating them about how to use the submodule merge strategy is basically impossible. It involves the concepts of additional remotes, about merging, branches, and then mixing it all into one workflow. Pulling from upstream and pushing upstream is a two stage process. Since branches is difficult to understand for them, this is all hopeless.

With submodules it's still too complicated for them (sigh) but it is easier to understand: It's just a repo within a repo (they are familiar with hierarchy) and you can do your pushing and pulling as usual.

Providing simple wrapper scripts is easier imho for the submodule workflow.

For large super-repos with many sub-repos the point of choosing not to clone data of some sub-repos is an important advantage of the submodules. We can limit this based on work requirements and disk space usage.

Access control might be different. Haven't had this issue yet, but if different repos require different access controls, effectively banning some users from some sub-repos, I wonder if that's easier to accomplish with the submodule approach.

Personally I'm undecided what to use myself. So I share your confusion :o]

Solution 3

Basically Git-subtree are the alternatives for the Git-submodule approach: There are many drawbacks or rather I would say, you need to be very careful while using git-submodules. e.g when you have "one" repo and inside "one" you have added another repo called "two" using submodules. Things you need to take care:

  • When you change something in "two", you need to commit and push inside "two", if you are at top-level directory (i.e in "one") your changes wont get highlighted.

  • When an unknown user tries to clone your "one" repo, after cloning "one" that user needs to update the submodules to get the "two" repo

These are some of the points and for better understanding I would recommend you to watch this video: https://www.youtube.com/watch?v=UQvXst5I41I

  • To overcome such problems subtree approach is invented. To get the basics about git-subtree, have a view on this: https://www.youtube.com/watch?v=t3Qhon7burE

  • I find subtree approach is more reliable and practical compare to submodules :) (I am very much beginner to say these things)

Cheers!

Solution 4

A real use case that we have where git subtree was a salvation:

The main product of our company is high modular and developed in several projects in separate repositories. All modules have their separate roadmap. Whole product is composed with all modules of concrete versions.

In parallel the concrete version of whole product is customized for each of our clients - seperate branches for each module. Customization have to be made sometimes in several project at once (cross-module customization).

To have a separate product life cycle (maintenance, feature branches) for customized product we introduced git subtree. We have one git-subtree repository for all customized modules. Our customization are everyday 'git subtree push' back to all original repositories to customization branches.

Like this we avoid managing many repos and many braches. git-subtree increased our productivity several times!

UPDATE

More details about solution that was posted to comments:

We created a brand new repository. Then we added each project that had client branch to that new repo as subtree. We had a jenkins job that was pushing back master changes to original repositories to client branch regularly. We worked just with "client repo" using tipical git flow with feature and maintenance branches.

Our 'client' repo had also building scripts that we also adapted for this particular client.

However there is a pitfall of presented solution.

As we were going farther and farther from the main core development of product the possible upgrade for that particular client was more and more difficult. In our case it was ok as the state of project before subtree had been already far a way of main path, so the subtree introduce at least order and possibility to introduce default git flow.

Solution 5

To add to above answers, an additional drawback of using subtree is the repo size compared to submodules.

I don't have any real world metrics, but given that each time a push is made on a module, everywhere that module is used gets a copy of the same change on the parent module (when is subsequently updated on those repos).

So if a code base is heavily modularised, that will add up quite quickly.

However, given storage prices are always coming down, that may not be a significant factor.

Share:
42,827
Lernkurve
Author by

Lernkurve

Updated on May 27, 2020

Comments

  • Lernkurve
    Lernkurve almost 4 years

    What problem does git subtree solve? When and why should I use that feature?

    I've read that it is used for repository separation. But why would I not just create two independent repositories instead of sticking two unrelated ones into one?

    This GitHub tutorial explains how to perform Git subtree merges.

    I kind of know how to use it, but not when (use cases) and why, and how it relates to git submodule. I'd use submodules when I have a dependency on another project or library.