Rsync -avzHP follows hardlinks instead of copying them as hardlinks

24,542

Solution 1

The rsync command's -H (or --hard-links) option will, in theory, do what you are trying to accomplish, which is, in brief: to create a copy of your filesystem that preserves the hard linked structure of the original. As I mentioned in my answer to another similar question, this option is doomed to fail once your source filesystem grows beyond a certain threshold of hard link complexity.

The precise location of that threshold may depend on your RAM and the total number of hard links (and probably a number of other things), but I have found that there's no point in trying to define it precisely. What really matters is that the threshold is all-too-easy to cross in real-world situations, and you won't know that you have crossed it, until the day comes that you try to run an rsync -aH or a cp -a that struggles and eventually fails.

What I recommend is this: Copy your heavily hard linked filesystem as one unit, not as files. That is, copy the entire filesystem partition as one big blob. There are a number of tools available to do this, but the most ubiquitous is dd.

With stock firmware, your QNAP NAS should have dd built in, as well as fdisk. With fdisk, create a partition on the destination drive that is at least as large as the source partition. Then, use dd to create an exact copy of your source partition on the newly created destination partition.

While the dd copy is in progress, you must ensure that nothing changes in the source filesystem, lest you end up with a corrupted copy on the destination. One way to do that is to umount the source before starting the copying process; another way is to mount the source in read-only mode.

Solution 2

This is a long shot, but if you can not find another solution i would suggest trying to format the USB drive as EXT4. Maybe this might be the problem: https://bugzilla.samba.org/show_bug.cgi?id=7670

Given enough hard links in a source folder and a small enough destination volume, copying with rsync --hard-links can fail. Rsync fails by exhausting the maximum number of hard links on the destination <...> the real issue isn't rsync but instead the underlying file system.

Solution 3

-l is for symlinks, why would it do anything for hardlinks?

(Sorry this is an answer and not a comment, I don't have comment rights yet and this answer needed a response)

Another note that should be a comment: is this all native hardware or are you on a VM, network mount?

Edit

ignore my earlier comment regarding why you are using hardlinks, I missed the rsnapshot comment.

It would be helpful to have a test that first tests rsync between two local directories local disk, then against your remote disk. This little test shows the -H option wokrs as expected. The -i option for ls shows the inodes, thus showing that the links have been preserved, with no extra copies.

$ rsync -avzHP src/ dest
sending incremental file list
created directory dest
./
file111_prime.txt
           9 100%    0.00kB/s    0:00:00 (xfer#1, to-check=0/3)
file111.txt => file111_prime.txt

sent 156 bytes  received 59 bytes  430.00 bytes/sec
total size is 18  speedup is 0.08

$ ls -liR
.:
total 8
414044 drwxrwxr-x. 2 nhed nhed 4096 Feb 25 09:58 dest
414031 drwxrwxr-x. 2 nhed nhed 4096 Feb 25 09:58 src

./dest:
total 8
414046 -rw-rw-r--. 2 nhed nhed 9 Feb 25 09:57 file111_prime.txt
414046 -rw-rw-r--. 2 nhed nhed 9 Feb 25 09:57 file111.txt

./src:
total 8
414032 -rw-rw-r--. 2 nhed nhed 9 Feb 25 09:57 file111_prime.txt
414032 -rw-rw-r--. 2 nhed nhed 9 Feb 25 09:57 file111.txt

A subsequent test rsync -avzHP src/ host:/tmp to a remote host still maintained the hardlinks

Share:
24,542
Hossein Nazarnejad
Author by

Hossein Nazarnejad

If you can't stand the heat, get out of the kitchen.

Updated on September 18, 2022

Comments

  • Hossein Nazarnejad
    Hossein Nazarnejad almost 2 years

    I use rsnapshot to create hourly/daily/weekly/monthly backups of my "work"-share. Now I'm trying to copy the whole backup-directory onto an external drive using rsync.

    I used this command/parameters within a screen session (yes, the rsync-exclude.txt lies in the dir I run the command from)

    rsync -avzHP --exclude-from 'rsync-exclude.txt' /share/backup/ /share/eSATADisk1/backup/;
    

    The whole thing is running on a QNAP TS-439, the internal drive is a single disk (no RAID) formated EXT4, the external drive is formated EXT3.

    What happens is: Rsync follows every hardlink and copies the actual file instead of recreating the updated hardlink on the external drive. I didn't recognize this right away so the external drive ended up trashed with xxx copies of the same files.

    What I want to achieve is: Copying the whole file structure generated by rsnapshot to the external drive keeping the hardlinks to save space. Note: This must not necessarily been done using rsync.

    Thanks for your ideas and time. I'd appreciate your help, big time.

    Update: I learned, that rsnapshot isn't using symlinks, it's using hardlinks so I now use the -H option which should preserve the hardlink structure acording to Rsnapshot to multiple destinations (or maintain hard links structure) but it still won't work... what am I missing here?

    Update 2: I found another opinion/statement on this topic here: rsync with --hard-links freezes Steven Monday suggests not trying to rsync big file structures containing hardlinks, since it soaks up a lot memory an is a hard task for rsync. So probably a better solution would be making an .img of the data structure I'm trying to backup. What do you think?

    • mmalmeida
      mmalmeida over 6 years
      I am doing the exact same as you! +1. Will try the dd approach
  • Hossein Nazarnejad
    Hossein Nazarnejad over 12 years
    You're totally right, after some further research I discovered that rsnapshot isn't using symlinks but hardlinks. I updated my question accordingly. So the solution should be using -H and copying the whole directory (as I do it) to preserve the hardlink structure built by rsnapshot but it still doesn't work. When I begin to copy everthing out of daily.0 is getting copied, not just the changed files. // And yes, I'm using a Qnap TS-439 and a external Lacie Drive for this operation.
  • nhed
    nhed over 12 years
    Can you reduce this problem down by having a test source dir and a test destination dir with just 2 files in the source, hardlinked together? Also, how are you determining that the link wasn't handled correctly and lastly, why use hard-links, if you read the long-text for -H in the manpage you can see that there are several caveats, which to me would say, try to stay-away from hardlinks...
  • Hossein Nazarnejad
    Hossein Nazarnejad over 12 years
    Thanks for your participation on my problem! Looks like this is samba related. My drive is directly attached to the NAS.
  • Hossein Nazarnejad
    Hossein Nazarnejad over 12 years
    I'll setup a test case and keep you updated. Thank you so much for you ideas so far.
  • Motsel
    Motsel over 12 years
    Hi there, no this problem is not Samba related. It's the home of the rsync website: rsync.samba.org
  • Sridhar Sarnobat
    Sridhar Sarnobat almost 7 years
    Suppose I never use hard links outside the rsnapshot backups directory, will I still get in trouble? I'm really short of hard disk space but want to make rsnapshot backups. Currently my disk gets full.
  • Guangliang
    Guangliang about 5 years
    I think I hit the situation you pointed out. I have a backup directory with many snapshots created with rsync. It has many files with many hard links. The total disk usage is about 200G. I'm copying it to another partition using 'rsync -avH'. But after 4 (or 5?) days and nights, the copying process is still running. I guess rsync is thoroughly confused by the total numbers of hard links in the source directory.
  • Brent Bradburn
    Brent Bradburn over 4 years
    In Ubuntu 18.04 it's --hard-links (with an 's').