How to copy a directory recursively using hardlinks for each file

33,675

Solution 1

On Linux (more precisely with the GNU and busybox implementations of cp as typically found on systems that have Linux as a kernel) and recent FreeBSD, this is how:

cp -al dirA dirB

For a more portable solution, see answer using pax and cpio by Stéphane Chazelas

Solution 2

POSIXly, you'd use pax in read+write mode with the -l option:

pax -rwlpe -s /A/B/ dirA .

(-pe preserves all possible attributes of files (in this case only directories) that are copied, like GNU cp's -a does).

Now, though standard, that command is not necessarily very portable.

First, many GNU/Linux-based systems don't include pax by default (even though that's a non-optional POSIX utility).

Then, a number of bugs and non-conformances with a few implementations cause a number of issues with that code.

  • because of a bug, Solaris 10 pax (at least) doesn't work when using -rwl in combination with -s. For some reason, it seems it applies the substitution to both the original and copied path. So above, it would attempt to do some link("dirB/file", "dirB/file") instead of link("dirA/file", "dirB/file").
  • on FreeBSD, pax doesn't create hardlinks for files of type symlink (a behaviour allowed by POSIX). Not only that, but it also applies the substitution to the targets of the symlinks (a behaviour not allowed by POSIX). So for instance if there's a foo -> AA symlink in dirA, it will become foo -> BA in dirB.

Also, if you want to do the same but with arbitrary file paths whose content is stored in $src and $dst, it's important to realise that pax -rwl -- "$src" "$dst" creates the full directory structure of $src inside $dst (that has to exist and be a directory). For instance, if $src is foo/bar, then, $dst/foo/bar is created.

If instead, you want $dst to be a copy of $src, the easiest is probably to do it as:

absolute_dst=$(umask 077 && mkdir -p -- "$dst" && cd -P -- "$dst" && pwd -P) &&
(cd -P -- "$src" && pax -rwlpe . "$absolute_dst")

(which would also work around most of the problems mentioned above but would fail if the absolute path of $dst ends in newline characters).

Now that won't help on GNU/Linux systems where there's no pax.

It's interesting to note that pax was created by POSIX to merge the features of the tar and cpio commands.

cpio is a historical Unix command (from 1977) as opposed to a POSIX invention, and there is a GNU implementation as well (not a pax one). So even though it is no longer a standard command (it was in SUSv2 though), it is still very common, and there's a core set of features you can usually rely on.

The equivalent of pax -rwl would be cpio -pl. However:

  1. cpio takes the list of input file on stdin as opposed to arguments (newline delimited which means file names with newline characters are not supported)
  2. All files have to be specified (typically you feed it the output of find (find and cpio were developed jointly by the same people)).
  3. metadata are not preserved (some cpio implementations have options to preserve some, but nothing portable).

So with cpio:

absolute_dst=$(umask 077 && mkdir -p -- "$dst" && cd -P -- "$dst" && pwd -P) &&
(cd -P -- "$src" && find . | cpio -pl "$absolute_dst")

Solution 3

Short answer:

cd $source_folder
pax -rwlpe . $dest_folder

Solution 4

rsync -av --link-dest="$PWD/dirA" dirA/ dirB

If you happen to have rsync already installed this one is a quick simple command. To cope with symlinks you may want to choose among --links, --copy-links, --copy-unsafe-links or --safe-links

From the rsync man page:

--link-dest=DIR         hardlink to files in DIR when unchanged
 -l, --links                 copy symlinks as symlinks
 -L, --copy-links            transform symlink into referent file/dir
--copy-unsafe-links     only "unsafe" symlinks are transformed
--safe-links            ignore symlinks that point outside the tree

Edit:

  • Fixed the command after the comment by @MichaelR. Thank you!
  • Tested as follows on MacOS using rsync 2.6.9
$ cd /tmp && rm -rf a b; mkdir a && touch a/c && echo "xxx" > a/c && rsync -av --link-dest="$PWD/a" a/ b; 
$ ls -lR a b
building file list ... done
created directory b
./

sent 74 bytes  received 26 bytes  200.00 bytes/sec
total size is 4  speedup is 0.04
a:
total 8
-rw-r--r--  2 user  wheel  4 Aug 26 16:09 c

b:
total 8
-rw-r--r--  2 user  wheel  4 Aug 26 16:09 c

Solution 5

In case you are looking for that copy-with-hardlinks feature to make snapshots or backups of (all or part of) your files have a look at rsnapshot.

Share:
33,675

Related videos on Youtube

Gudmundur Orn
Author by

Gudmundur Orn

Updated on September 18, 2022

Comments

  • Gudmundur Orn
    Gudmundur Orn almost 2 years

    I want to create a "copy" of a directory tree where each file is a hardlink to the original file

    Example: I have a directory structure:

    dirA/
    dirA/file1
    dirA/x/
    dirA/x/file2
    dirA/y/
    dirA/y/file3
    

    Here is the expected result, a "copy" of the directory tree where each file is a hardlink to the original file:

    dirB/            #  normal directory
    dirB/file1       #  hardlink to dirA/file1
    dirB/x/          #  normal directory
    dirB/x/file2     #  hardlink to dirA/x/file2
    dirB/y/          #  normal directory
    dirB/y/file3     #  hardlink to dirA/y/file3
    
  • Gudmundur Orn
    Gudmundur Orn about 9 years
    That's interesting. But I guess hard-links are only a good snapshot mechanism if the files will not be modified. Right?
  • Janis
    Janis about 9 years
    @Gudmundur Orn; This is correct. The tool mentioned in my answer will create a new snapshot in a way that files are unique; i.e. existing (unmodified) files will be created as hardlinks and new files (or modified versions of existing files) will be created as new files. So in consequence you will have the least redundancy.
  • Gudmundur Orn
    Gudmundur Orn about 9 years
    Seems that -s/A/B/ is specific to my example. How would you do this if the source directory name and target directory name were variables $sourcedir and $targetdir?
  • Stéphane Chazelas
    Stéphane Chazelas about 9 years
    @GudmundurOrn, see edit.
  • Stéphane Chazelas
    Stéphane Chazelas about 9 years
    Note that like pax, on FreeBSD, cp -a doesn't hardlink symlinks.
  • Dave
    Dave over 8 years
    Be aware that hard links do not work across separate filesystem mounts.
  • Michel
    Michel almost 8 years
    I run this command on OS X and just receives an error message "pax: Unable to link file ./a.txt to itself". I used the your command literally, just replacing the source directory with the actual name, leaving /A/B and the final dot as is. Am I misunderstanding something?
  • Vincent Pazeller
    Vincent Pazeller almost 4 years
    Note that the pe toggles can cause privilege issues (Operation not permitted) because ` pax` is calling chown. For my use case having hardlinks attributed to the executing user was fine so I ended up using simply pax -rwl
  • Kelly Bang
    Kelly Bang over 3 years
    If dirB exists, dirB will CONTAIN the new dirA. If dirB does not exist, dirB will BE the new dirA. But it probably depends on your OS as to the exact behavior.
  • endolith
    endolith over 3 years
    -a = --archive = "same as -dR --preserve=all" = "never follow symbolic links in SOURCE", --preserve=links, and "copy directories recursively", while -l means "hard link files instead of copying"
  • endolith
    endolith over 3 years
    You mean cp -a?
  • Michael R
    Michael R almost 3 years
    --link-dest doesn't appear to work for me using rsync v2.6.9 on macOS. I'm running rsync -av --link-dest=a a/ b/ and directory b/ contains file copies, not hardlinks.
  • Adan Cortes
    Adan Cortes almost 3 years
    @MichaelR, you are right. I'm sorry! I tested the following on MacOS and it works: rsync -av --link-dest="$PWD/a" a/ b