How to copy a directory recursively using hardlinks for each file
Solution 1
On Linux (more precisely with the GNU and busybox
implementations of cp
as typically found on systems that have Linux as a kernel) and recent FreeBSD, this is how:
cp -al dirA dirB
For a more portable solution, see answer using pax and cpio by Stéphane Chazelas
Solution 2
POSIXly, you'd use pax
in read+write mode with the -l
option:
pax -rwlpe -s /A/B/ dirA .
(-pe
preserves all possible attributes of files (in this case only directories) that are copied, like GNU cp
's -a
does).
Now, though standard, that command is not necessarily very portable.
First, many GNU/Linux-based systems don't include pax
by default (even though that's a non-optional POSIX utility).
Then, a number of bugs and non-conformances with a few implementations cause a number of issues with that code.
- because of a bug, Solaris 10
pax
(at least) doesn't work when using-rwl
in combination with-s
. For some reason, it seems it applies the substitution to both the original and copied path. So above, it would attempt to do somelink("dirB/file", "dirB/file")
instead oflink("dirA/file", "dirB/file")
. - on FreeBSD,
pax
doesn't create hardlinks for files of type symlink (a behaviour allowed by POSIX). Not only that, but it also applies the substitution to the targets of the symlinks (a behaviour not allowed by POSIX). So for instance if there's afoo -> AA
symlink indirA
, it will becomefoo -> BA
indirB
.
Also, if you want to do the same but with arbitrary file paths whose content is stored in $src
and $dst
, it's important to realise that pax -rwl -- "$src" "$dst"
creates the full directory structure of $src
inside $dst
(that has to exist and be a directory). For instance, if $src
is foo/bar
, then, $dst/foo/bar
is created.
If instead, you want $dst
to be a copy of $src
, the easiest is probably to do it as:
absolute_dst=$(umask 077 && mkdir -p -- "$dst" && cd -P -- "$dst" && pwd -P) &&
(cd -P -- "$src" && pax -rwlpe . "$absolute_dst")
(which would also work around most of the problems mentioned above but would fail if the absolute path of $dst
ends in newline characters).
Now that won't help on GNU/Linux systems where there's no pax
.
It's interesting to note that pax
was created by POSIX to merge the features of the tar
and cpio
commands.
cpio
is a historical Unix command (from 1977) as opposed to a POSIX invention, and there is a GNU implementation as well (not a pax
one). So even though it is no longer a standard command (it was in SUSv2 though), it is still very common, and there's a core set of features you can usually rely on.
The equivalent of pax -rwl
would be cpio -pl
. However:
cpio
takes the list of input file on stdin as opposed to arguments (newline delimited which means file names with newline characters are not supported)- All files have to be specified (typically you feed it the output of
find
(find
andcpio
were developed jointly by the same people)). - metadata are not preserved (some
cpio
implementations have options to preserve some, but nothing portable).
So with cpio
:
absolute_dst=$(umask 077 && mkdir -p -- "$dst" && cd -P -- "$dst" && pwd -P) &&
(cd -P -- "$src" && find . | cpio -pl "$absolute_dst")
Solution 3
Short answer:
cd $source_folder
pax -rwlpe . $dest_folder
Solution 4
rsync -av --link-dest="$PWD/dirA" dirA/ dirB
If you happen to have rsync
already installed this one is a quick simple command. To cope with symlinks you may want to choose among --links, --copy-links, --copy-unsafe-links
or --safe-links
From the rsync man page:
--link-dest=DIR hardlink to files in DIR when unchanged
-l, --links copy symlinks as symlinks
-L, --copy-links transform symlink into referent file/dir
--copy-unsafe-links only "unsafe" symlinks are transformed
--safe-links ignore symlinks that point outside the tree
Edit:
- Fixed the command after the comment by @MichaelR. Thank you!
- Tested as follows on MacOS using rsync 2.6.9
$ cd /tmp && rm -rf a b; mkdir a && touch a/c && echo "xxx" > a/c && rsync -av --link-dest="$PWD/a" a/ b;
$ ls -lR a b
building file list ... done
created directory b
./
sent 74 bytes received 26 bytes 200.00 bytes/sec
total size is 4 speedup is 0.04
a:
total 8
-rw-r--r-- 2 user wheel 4 Aug 26 16:09 c
b:
total 8
-rw-r--r-- 2 user wheel 4 Aug 26 16:09 c
Solution 5
In case you are looking for that copy-with-hardlinks feature to make snapshots or backups of (all or part of) your files have a look at rsnapshot
.
Related videos on Youtube
Gudmundur Orn
Updated on September 18, 2022Comments
-
Gudmundur Orn almost 2 years
I want to create a "copy" of a directory tree where each file is a hardlink to the original file
Example: I have a directory structure:
dirA/ dirA/file1 dirA/x/ dirA/x/file2 dirA/y/ dirA/y/file3
Here is the expected result, a "copy" of the directory tree where each file is a hardlink to the original file:
dirB/ # normal directory dirB/file1 # hardlink to dirA/file1 dirB/x/ # normal directory dirB/x/file2 # hardlink to dirA/x/file2 dirB/y/ # normal directory dirB/y/file3 # hardlink to dirA/y/file3
-
Gudmundur Orn about 9 yearsThat's interesting. But I guess hard-links are only a good snapshot mechanism if the files will not be modified. Right?
-
Janis about 9 years@Gudmundur Orn; This is correct. The tool mentioned in my answer will create a new snapshot in a way that files are unique; i.e. existing (unmodified) files will be created as hardlinks and new files (or modified versions of existing files) will be created as new files. So in consequence you will have the least redundancy.
-
Gudmundur Orn about 9 yearsSeems that -s/A/B/ is specific to my example. How would you do this if the source directory name and target directory name were variables $sourcedir and $targetdir?
-
Stéphane Chazelas about 9 years@GudmundurOrn, see edit.
-
Stéphane Chazelas about 9 yearsNote that like
pax
, on FreeBSD,cp -a
doesn't hardlink symlinks. -
Dave over 8 yearsBe aware that hard links do not work across separate filesystem mounts.
-
Michel almost 8 yearsI run this command on OS X and just receives an error message "pax: Unable to link file ./a.txt to itself". I used the your command literally, just replacing the source directory with the actual name, leaving /A/B and the final dot as is. Am I misunderstanding something?
-
Vincent Pazeller almost 4 yearsNote that the
pe
toggles can cause privilege issues (Operation not permitted) because ` pax` is callingchown
. For my use case having hardlinks attributed to the executing user was fine so I ended up using simplypax -rwl
-
Kelly Bang over 3 yearsIf dirB exists, dirB will CONTAIN the new dirA. If dirB does not exist, dirB will BE the new dirA. But it probably depends on your OS as to the exact behavior.
-
endolith over 3 years
-a
=--archive
= "same as-dR --preserve=all
" = "never follow symbolic links in SOURCE",--preserve=links
, and "copy directories recursively", while-l
means "hard link files instead of copying" -
endolith over 3 yearsYou mean
cp -a
? -
Michael R almost 3 years
--link-dest
doesn't appear to work for me using rsync v2.6.9 on macOS. I'm runningrsync -av --link-dest=a a/ b/
and directory b/ contains file copies, not hardlinks. -
Adan Cortes almost 3 years@MichaelR, you are right. I'm sorry! I tested the following on MacOS and it works:
rsync -av --link-dest="$PWD/a" a/ b