Partial clone with Git and Mercurial
Solution 1
In Git land you are talking about three different types of partial clones:
-
shallow clones: I want history from revision point X onward.
Use
git clone --depth <n> <url>
for that, but please remember that shallow clones are somewhat limited in interacting with other repositories. You would be able to generate patches and send them via email. -
partial clone by filepath: I want all revision history history in some directory
/path
.Not possible in Git. With modern Git though you can have sparse checkout, i.e. you have whole history but you check out (have in working area) only subset of all files.
-
cloning only selected branch: I want to clone only one branch (or selected subset of branches).
Possible, and
before git 1.7.10 not simple: you would need to do what clone does manually, i.e.
git init [<directory>]
, thengit remote add origin <url>
, edit.git/config
replacing*
inremote.origin.fetch
by requested branch (probably 'master'), thengit fetch
.as of git 1.7.10
git clone
offers the--single-branch
option which seems like it was added just for this purpose, and seems pretty easy.Note however that because branches usually share most of their history, the gain from cloning only a subset of branches might be smaller than you think.
You can also do a shallow clone of only selected subset of branches.
If you know how people will want to break things down by filepath (multiple projects in the same repository) you can use submodules (sort of like svn:externals) to pre-split the repo into separately cloneable portions.
Solution 2
In mercurial land you're talking about three different types of partial clones:
- shallow clones: I want the history from revision point X onward use the remotefilelog extension
- partial clones by filepath: I want all revision history in directory /path with experimental narrowhg extension or I want only files in directory /path to be in my working directory with experimental sparse extension (shipped since version 4.3, see
hg help sparse
). - partial clones by branch: I want all revision history on branch Y: use clone -r
If you know how people will want to break things down by filepath (multiple projects in the same repo (shame on you)) you can use subrepositories (sort of like svn externals) to pre-split the repo into separately cloneable portions
Also, as to the "so huge I'd like to only get a part of it": You really only have to do that one time ever. Just clone it while you have lunch, and then you have it forever more. Subsequently you can pull
and get deltas efficiently going forward. And if you want another clone of it, just clone your first clone. Where you got a clone doesn't matter (and local clones take up no additional diskspace since they're hard links under the covers).
Solution 3
The selected answer provides a good overview, but lacks a complete example.
Minimize your download and checkout footprint (a), (b):
git clone --no-checkout --depth 1 --single-branch --branch (name) (repo) (folder)
cd (folder)
git config core.sparseCheckout true
echo "target/path/1" >>.git/info/sparse-checkout
echo "target/path/2" >>.git/info/sparse-checkout
git checkout
Periodically optimize your local repository footprint (c) (optional, use with care):
git clean --dry-run # consider and tweak results then switch to --force
git gc
git repack -Ad
git prune
See also: How to handle big repositories with git
Solution 4
This method creates an unversioned archive without subrepositories:
hg clone -U ssh://machine//directory/path/to/repo/project projecttemp
cd projecttemp
hg archive -r tip ../project-no-subrepos
The unversioned source code without the subrepositoies is in the project-no-subrepos directory
Solution 5
Regarding Git it might be of a historical significance that Linus Torvalds answered this question from the conceptual perspective back in 2007 in a talk that was recorded and is available online.
The question is whether it is possible to check out only some files out of a Git repository.
Tech Talk: Linus Torvalds on git t=43:10
To summarize, he said that one of the design decisions of Git that sets it apart from other source management systems (he cites BitKeeper and SVN) is that Git manages content, not files. The implications being that e.g. a diff of a subset of files in two revisions is computed by first taking the whole diff and then pruning it only to the files that were requested. Another is that you have to check out the whole history; in an all or nothing fashion. For this reason, he suggests splitting loosely related components among multiple repositories and mentions a then ongoing effort to implement an user interface for managing a repository that is structured as a super-project holding smaller repositories.
As far as I know this fundamental design decision still apples today. The super-project thing probably became what now are submodules.
pablo
I'm co-founder at plasticscm.com. We develop a Distributed Version Control System, compete with companies ten times our size on a daily basis, and beat them very often :) We try to have the best DVCS for Windows (yes, better than Git) although it also runs on Linux and MacOS X (and even on raspberry pi! :P) I love programming, C#, .NET remoting, Mono but I still miss C++ on a daily basis. I do version control for a living and enjoy talking about branching, merging and all things SCM. You can reach me at pablo at codice.es
Updated on July 26, 2022Comments
-
pablo almost 2 years
Is it possible to clone only one branch (or from a given commit) in Git and Mercurial? I mean, I want to clone a central repo but since it's huge I'd like to only get part of it and still be able to contribute back my changes. Is it possible? Like, I only want from Tag 130 onwards or something like that?
If so, how?
-
jk. about 14 yearsalso tags aren't the same as branches unlike in some VCS so this comes under the first point
-
pablo about 14 yearsSo, if I clone branch "XX" it will get all the parent commits from "master", right? Or only the single commit I've done on that branch?
-
Anonigan about 14 yearsIf you clone (fetch) only branch "XX", you would get all its commits, including those commits that branch "XX" has in common with "master" branch. In Git commits do not 'belong' to a branch.
-
pablo about 14 yearsOk, then it's not a partial clone anyway since you get all the parents and hence the entire repos (ok, the biggest part which is on master)
-
panzi over 13 yearsThere are the trimming history (mercurial.selenic.com/wiki/TrimmingHistory) and shallow clone (mercurial.selenic.com/wiki/ShallowClone) plugins for mercurial. I don't know how good they are, though.
-
Ry4an Brase over 13 yearsBoth of those are rejected proposals without implementations.
-
Chris about 12 yearsWhat do you mean by Git vs modern Git? A partial clone is possible in the second but not the first?
-
Anonigan about 12 years@Chris: by saying that "sparse checkou" requires modern Git I meant here that this feature was only recently added (at the time of posting), so it was available only in newest version.
-
Chris about 12 yearsAh, looks like version 1.7.0 added partial checkouts. Thanks.
-
Anonigan over 11 yearsIn 1.8.0 (or a bit earlier) making single branch clone is now much easier.
-
Priit over 10 yearsConvert extension rewrites the hashes thus this is not partial clone of the existing repo but rather a new one. Meaning it will be a separate repository that cannot pull or push from the original one.
-
pablo over 10 yearsI know the post... I originally submitted it to slashdot :P
-
Mathiasdm about 10 years* Shallow clones are now possible using 'remotefilelog': bitbucket.org/facebook/remotefilelog * Partial clones by filepath are possible (but still experimental), see comments.gmane.org/gmane.comp.version-control.mercurial.devel/…
-
Ry4an Brase about 10 yearsYeah, that's exciting. It relies on a centralized cache for operation, so it's not for all environments, but it's a very nice bit of work to come out of facebook.
-
Anon over 7 yearsEarly 2017: partial clones by filepath (aka narrow clone) still isn't in mainline Mercurial but is possible with an extension from Google - bitbucket.org/Google/narrowhg . Similarly sparse checkout (aka narrow checkout) isn't in mainline Mercurial but is possible using the
sparse.py
Mercurial extension from Facebook - bitbucket.org/facebook/hg-experimental . -
VonC over 6 yearsYou might add to that list "partial clone" (or "narrow clone") with Git 2.17 (Q2 2018): stackoverflow.com/a/48852630/6309
-
Mathiasdm over 5 years2018: both narrow and sparse are now experimental extensions in Mercurial itself (no longer need to be downloaded).