How to copy directories with preserving hardlinks?

48,025

Solution 1

First answer: The GNU Way

GNU cp -a copies recursively preserving as much structure and metadata as possible. Hard links between files in the source directory are included in that. To select hard link preservation specifically without all the other features of -a, use --preserve=links.

mkdir src
cd src
mkdir -p a/{b,c,d}/{x,y,z}
touch a/{b,c,d}/{x,y,z}/f{1,2,3,4,5}
cp -r -l a hardlinks_of_a
cd ..
cp -a src dst

Solution 2

rsync has a -H or --hard-links option for this, and has the usual rsync benefits of being able to be stopped and restarted, and to be re-run to efficiently deal with any files that were changed during/after the previous run.

-H, --hard-links
    This tells rsync to look for hard-linked files in
    the source and link together the corresponding
    files on the destination.  Without  this option,
    hard-linked files in the source are treated as
    though they were separate files. [...]

Read the rsync man page and search for -H. There is a lot more detail there about particular caveats.

Solution 3

Third answer: The POSIX Way

POSIX hasn't standardized the tar utility, although they have standardized the tar archive format. The POSIX utility for manipulating tar archives is called pax and it has the bonus feature of being able to do the pack and unpack operation in a single process.

mkdir dst
pax -rw src dst

Solution 4

Second answer: The Ancient UNIX Way

Create a tar archive in the source directory, send it over a pipe, and unpack it in the destination directory.

# create src as before
(cd src;tar cf - .) | (mkdir dst;cd dst;tar xf -)

Solution 5

Source: http://www.cyberciti.biz/faq/linux-unix-apple-osx-bsd-rsync-copy-hard-links/

What you need to make an exact copy is

rsync -az -H --delete --numeric-ids /path/to/source/ /path/to/dest/
Share:
48,025

Related videos on Youtube

Grzegorz Wierzowiecki
Author by

Grzegorz Wierzowiecki

Updated on September 18, 2022

Comments

  • Grzegorz Wierzowiecki
    Grzegorz Wierzowiecki almost 2 years

    How to move directories that have files in common from one to another partition ?

    Let's assume we have partition mounted on /mnt/X with directories sharing files with hardlinks. How to move such directories to another partition , let it be /mnt/Y with preserving those hardlinks.

    For better illustration what do I mean by "directories sharing files in common with hardlinks", here is an example:

    # let's create three of directories and files
    mkdir -p a/{b,c,d}/{x,y,z}
    touch a/{b,c,d}/{x,y,z}/f{1,2,3,4,5}
    # and copy it with hardlinks
    cp -r -l a hardlinks_of_a
    

    To be more specific, let's assume that total size of files is 10G and each file has 10 hardlinks. The question is how to move it to destination with using 10G (someone might say about copying it with 100G and then running deduplication - it is not what I am asking about)

  • WhyNotHugo
    WhyNotHugo almost 12 years
    +1 on tar, -1 for using gnu-specific arguments for cp.
  • Grzegorz Wierzowiecki
    Grzegorz Wierzowiecki almost 12 years
    I've checked - it works.
  • Grzegorz Wierzowiecki
    Grzegorz Wierzowiecki almost 12 years
    I've checked - cp -a works ! (please @AlanCurry separate answers into three)
  • Grzegorz Wierzowiecki
    Grzegorz Wierzowiecki almost 12 years
    checked -> works. Hardlinks preserved.
  • Alessio
    Alessio almost 12 years
    @Hugo: there's nothing wrong with using GNU-specific args to standard tools. GNU versions are the de-facto standard these days, and even when they weren't pre-installed, it was common practice to install GNU tools (I know I always did - they were simply better than, e.g, solaris and *bsd versions, and they provided consistency between different *nixes). It's probably good practice to point out GNUisms when you use them but not required. Also Grzegorz didn't say "not on linux" so it's reasonable to assume that that's the environment he's talking about.
  • Alessio
    Alessio almost 12 years
    yep, i know. I've been using it for years in my backup scripts. also to move files between filesystems as in your question.
  • WhyNotHugo
    WhyNotHugo almost 12 years
    It's not reasonable to assume he uses the same OS as you, and it's not common practice to install gnu base tools on non-gnu systems. As a minimum, you should always clarify this. Using truisms DECREASES portability; POSIX is way more standard.
  • Grzegorz Wierzowiecki
    Grzegorz Wierzowiecki almost 12 years
    So, I am happy to see non-gnu answers in topic as well :). (Please remember that this answer was edited, and previously has gnu and non-gnu answers, not it's split into three, so you can up-vote whichever you want)
  • peterph
    peterph almost 9 years
    Any insight into why this actually does preserve hardlinks?
  • Alessio
    Alessio almost 9 years
    Because tar preserves hard-links. In GNU tar, at least, you can disable this behaviour with --hard-dereference
  • msc
    msc over 6 years
    rsync uses gobs of memory when building its file list. For me after many hours of "Building file list..." it filled up my 16GB of memory and bailed having copied nothing. YMMV.
  • msc
    msc over 6 years
    See my comment about rsync above.
  • msc
    msc over 6 years
    In my case, attempting to copy a large directory hierarchy (a TimeMachine backup), tar preserved some hard links but replicated the file in some cases. I think this is because the tar x does not have the full file list as files are still being piped in from the tar c. Probably if you saved the entire archive before extracting it, it would be okay. I'd be very happy if someone could confirm that theory.
  • Alessio
    Alessio over 6 years
    From man rsync: Beginning with rsync 3.0.0, the recursive algorithm used is now an incremental scan that uses much less memory than before and begins the transfer after the scanning of the first few directories have been completed. This incremental scan only affects our recursion algorithm, and does not change a non-recursive transfer. It is also only possible when both ends of the transfer are at least version 3.0.0. Note that both --delete-before and --delete-after disable this improved algorithm.
  • Alessio
    Alessio over 6 years
    Also, while rsync is an incredibly useful too, it isn't always the best tool for every job. These days, I prefer to use ZFS datasets so I can snapshot and zfs send them - I mostly use rsync on non-ZFS filesystems. btrfs has a similar snapshot + send capability.
  • msc
    msc over 6 years
    Thank you @cas. The rsync in macOS High Sierra is 2.6.9. I'll see if I can get 3.0+ via MacPorts or some other way.
  • hraban
    hraban over 6 years
    GNU is far from standard no the desktop, what with Mac OS X shipping BSD tools. This won't work on Mac.
  • Michael
    Michael about 6 years
    @cas I don't see why rsync doesn't think -H requires knowing the entire file list. The fact that it doesn't means -H simply doesn't work as expected in most cases!
  • Edward Falk
    Edward Falk over 5 years
    I suspect this won't copy ACLs, extended attributes, and so forth. The Linux version also has the -A and -X options to preserve these, but I think you're out of luck on MacOS.
  • Johannes Overmann
    Johannes Overmann over 5 years
    @WhyNotHugo: How is POSIX "may more standard?". POSIX is the stuff which brought us where we are. Did you know that all Windows versions since Windows NT are fully POSIX compliant? They have a path length limitation of 255 characters when using the POSIX file I/O functions, which renders them useless. Did you know that Solaris, Irix, HP-UX are all POSIX compliant, and yet all the arguments to their tools differ (e.g. tar). cp -a is a minimum requirement for any cp version which wants to replace GNU copy.
  • Johannes Overmann
    Johannes Overmann over 5 years
    @hraban: Who is using the BSD tools on MacOS? :-) (SCNR)
  • jmr
    jmr about 4 years
    Can you explain your rationale of using the -z option (compress) when using rsync to copy between two mounted folders (since this is what was asked) ?
  • Matt
    Matt almost 4 years
    @Michael You don't need the file list ahead of time for -H to work -- proceeding incrementally is fine. You only need the list of files transferred so far to know when to use a hardlink at the receiving end.
  • Michael
    Michael almost 4 years
    @Matt duh, i was such a pearhead back in 2018!
  • Brian B
    Brian B over 3 years
    MacOS rsync will probably never go above 2.6.9 (without an Apple rewrite). Starting in version 3.0 it went to GPL v3.