How to copy directories with preserving hardlinks?
Solution 1
First answer: The GNU Way
GNU cp -a
copies recursively preserving as much structure and metadata as possible. Hard links between files in the source directory are included in that. To select hard link preservation specifically without all the other features of -a
, use --preserve=links
.
mkdir src
cd src
mkdir -p a/{b,c,d}/{x,y,z}
touch a/{b,c,d}/{x,y,z}/f{1,2,3,4,5}
cp -r -l a hardlinks_of_a
cd ..
cp -a src dst
Solution 2
rsync has a -H
or --hard-links
option for this, and has the usual rsync benefits of being able to be stopped and restarted, and to be re-run to efficiently deal with any files that were changed during/after the previous run.
-H, --hard-links
This tells rsync to look for hard-linked files in
the source and link together the corresponding
files on the destination. Without this option,
hard-linked files in the source are treated as
though they were separate files. [...]
Read the rsync
man page and search for -H. There is a lot more detail there about particular caveats.
Solution 3
Third answer: The POSIX Way
POSIX hasn't standardized the tar
utility, although they have standardized the tar
archive format. The POSIX utility for manipulating tar archives is called pax
and it has the bonus feature of being able to do the pack and unpack operation in a single process.
mkdir dst
pax -rw src dst
Solution 4
Second answer: The Ancient UNIX Way
Create a tar archive in the source directory, send it over a pipe, and unpack it in the destination directory.
# create src as before
(cd src;tar cf - .) | (mkdir dst;cd dst;tar xf -)
Solution 5
Source: http://www.cyberciti.biz/faq/linux-unix-apple-osx-bsd-rsync-copy-hard-links/
What you need to make an exact copy is
rsync -az -H --delete --numeric-ids /path/to/source/ /path/to/dest/
Related videos on Youtube
Grzegorz Wierzowiecki
Updated on September 18, 2022Comments
-
Grzegorz Wierzowiecki almost 2 years
How to move directories that have files in common from one to another partition ?
Let's assume we have partition mounted on
/mnt/X
with directories sharing files with hardlinks. How to move such directories to another partition , let it be/mnt/Y
with preserving those hardlinks.For better illustration what do I mean by "directories sharing files in common with hardlinks", here is an example:
# let's create three of directories and files mkdir -p a/{b,c,d}/{x,y,z} touch a/{b,c,d}/{x,y,z}/f{1,2,3,4,5} # and copy it with hardlinks cp -r -l a hardlinks_of_a
To be more specific, let's assume that total size of files is 10G and each file has 10 hardlinks. The question is how to move it to destination with using 10G (someone might say about copying it with 100G and then running deduplication - it is not what I am asking about)
-
WhyNotHugo almost 12 years+1 on tar, -1 for using gnu-specific arguments for cp.
-
Grzegorz Wierzowiecki almost 12 yearsI've checked - it works.
-
Grzegorz Wierzowiecki almost 12 yearsI've checked -
cp -a
works ! (please @AlanCurry separate answers into three) -
Grzegorz Wierzowiecki almost 12 yearschecked -> works. Hardlinks preserved.
-
Alessio almost 12 years@Hugo: there's nothing wrong with using GNU-specific args to standard tools. GNU versions are the de-facto standard these days, and even when they weren't pre-installed, it was common practice to install GNU tools (I know I always did - they were simply better than, e.g, solaris and *bsd versions, and they provided consistency between different *nixes). It's probably good practice to point out GNUisms when you use them but not required. Also Grzegorz didn't say "not on linux" so it's reasonable to assume that that's the environment he's talking about.
-
Alessio almost 12 yearsyep, i know. I've been using it for years in my backup scripts. also to move files between filesystems as in your question.
-
WhyNotHugo almost 12 yearsIt's not reasonable to assume he uses the same OS as you, and it's not common practice to install gnu base tools on non-gnu systems. As a minimum, you should always clarify this. Using truisms DECREASES portability; POSIX is way more standard.
-
Grzegorz Wierzowiecki almost 12 yearsSo, I am happy to see non-gnu answers in topic as well :). (Please remember that this answer was edited, and previously has gnu and non-gnu answers, not it's split into three, so you can up-vote whichever you want)
-
peterph almost 9 yearsAny insight into why this actually does preserve hardlinks?
-
Alessio almost 9 yearsBecause
tar
preserves hard-links. In GNU tar, at least, you can disable this behaviour with--hard-dereference
-
msc over 6 yearsrsync uses gobs of memory when building its file list. For me after many hours of "Building file list..." it filled up my 16GB of memory and bailed having copied nothing. YMMV.
-
msc over 6 yearsSee my comment about rsync above.
-
msc over 6 yearsIn my case, attempting to copy a large directory hierarchy (a TimeMachine backup), tar preserved some hard links but replicated the file in some cases. I think this is because the
tar x
does not have the full file list as files are still being piped in from thetar c
. Probably if you saved the entire archive before extracting it, it would be okay. I'd be very happy if someone could confirm that theory. -
Alessio over 6 yearsFrom
man rsync
: Beginning with rsync 3.0.0, the recursive algorithm used is now an incremental scan that uses much less memory than before and begins the transfer after the scanning of the first few directories have been completed. This incremental scan only affects our recursion algorithm, and does not change a non-recursive transfer. It is also only possible when both ends of the transfer are at least version 3.0.0. Note that both--delete-before
and--delete-after
disable this improved algorithm. -
Alessio over 6 yearsAlso, while
rsync
is an incredibly useful too, it isn't always the best tool for every job. These days, I prefer to use ZFS datasets so I can snapshot andzfs send
them - I mostly use rsync on non-ZFS filesystems.btrfs
has a similar snapshot + send capability. -
msc over 6 yearsThank you @cas. The rsync in macOS High Sierra is 2.6.9. I'll see if I can get 3.0+ via MacPorts or some other way.
-
hraban over 6 yearsGNU is far from standard no the desktop, what with Mac OS X shipping BSD tools. This won't work on Mac.
-
Michael about 6 years@cas I don't see why rsync doesn't think
-H
requires knowing the entire file list. The fact that it doesn't means-H
simply doesn't work as expected in most cases! -
Edward Falk over 5 yearsI suspect this won't copy ACLs, extended attributes, and so forth. The Linux version also has the -A and -X options to preserve these, but I think you're out of luck on MacOS.
-
Johannes Overmann over 5 years@WhyNotHugo: How is POSIX "may more standard?". POSIX is the stuff which brought us where we are. Did you know that all Windows versions since Windows NT are fully POSIX compliant? They have a path length limitation of 255 characters when using the POSIX file I/O functions, which renders them useless. Did you know that Solaris, Irix, HP-UX are all POSIX compliant, and yet all the arguments to their tools differ (e.g. tar). cp -a is a minimum requirement for any cp version which wants to replace GNU copy.
-
Johannes Overmann over 5 years@hraban: Who is using the BSD tools on MacOS? :-) (SCNR)
-
jmr about 4 yearsCan you explain your rationale of using the -z option (compress) when using rsync to copy between two mounted folders (since this is what was asked) ?
-
Matt almost 4 years@Michael You don't need the file list ahead of time for -H to work -- proceeding incrementally is fine. You only need the list of files transferred so far to know when to use a hardlink at the receiving end.
-
Michael almost 4 years@Matt duh, i was such a pearhead back in 2018!
-
Brian B over 3 yearsMacOS rsync will probably never go above 2.6.9 (without an Apple rewrite). Starting in version 3.0 it went to GPL v3.