`cp -al` snapshot whose hard links get directed to a new file when edited

13,914

Solution 1

That's how hardlinks work. But, there are ways around it:

A couple of options come to mind:

  • Use a filesystem with support for copy-on-write files, like btrfs. Of course, were you using btrfs, you'd just use its native snapshots... If your filesystem supports it, you can use cp --reflink=always. Unfortunately, ext4 doesn't support this.
  • Only share hardlinks across your snapshots, not with the original. That is, the first time you see a given version of a file, copy it to the snapshot. But the next time, link it to the one in the previous snapshot. (Not sure what program I used to do this—a decade ago—but searching turns up dirvish, obnam, storebackup, and rsnapshot)
  • Depending on how your files are being changed, you might be able to guarantee that a write temp/rename is used to change them, then that will break the hardlink—so the version in the snapshot will remain pristine. This is less safe, though, as bugs could corrupt your snapshot.
  • Take LVM snapshots of the entire filesystem.

Of course, there is the other option—use a proper backup system. Most all of them can manage to only backup changed files.

Solution 2

What you're looking for is a form of copy-on-write, where multiple files that have the same content use the same space on the disk until one of them is modified. Hard links only implement copy-on-write if the application that does the writing deletes the file and creates a new file by the same name (which is usually done by creating a new file by a different name, then moving it into place). The application you're using is evidently not doing this: it's overwriting the existing file.

Some applications can be configured to use the replacement strategy. Some applications use the replacement strategy by default, but use the overwrite strategy when they see a file with multiple hard links, precisely so as not to break the hard links. Your current snapshot technique will work if you can configure your application to replace instead of overwriting.

Fl-cow modifies programs to systematically use the replacement strategy on files with multiple hard links.

Alternatively, you may store your files on a filesystem that performs copy-on-write or deduplication, or have a snapshot feature, and not worry about hard links: Btrfs or Zfs. Depending on your partitioning scheme, using LVM snapshots may be an option.

My recommendation is to use a proper snapshot tool. Making reliable backups is surprisingly difficult. You probably want rsnapshot.

Solution 3

The following is a ruby script that I wrote that wraps the "cp -al" and rsync into a nice script that can be run manually or via cron. Destination can be local or remote (via ssh):

Ghetto Timemachine

The basic answer to your question, as mentioned in a previous comment, the source needs to be kept apart from the hard links. Ex, assume a daily backup of your home directory:

Source:

  • /home/flakrat

Destination:

  • /data/backup/daily
    • /monday
    • /tuesday
    • /wednesday
    • /thursday
    • ...

The hard links are created by running "cp -al" against yesterday's backup. Say it's Tuesday morning when you run it:

cd /data/backup/daily

rm -rf tuesday

cp -al monday tuesday

rsync -a --delete /home/flakrat /data/backup/daily/tuesday/

Share:
13,914

Related videos on Youtube

coder
Author by

coder

Updated on September 18, 2022

Comments

  • coder
    coder over 1 year

    I am trying to take snapshots of a massive folder regularly.

    I have read here: http://www.mikerubel.org/computers/rsync_snapshots/#Incremental
    that cp -al takes a snapshot of a folder by simply copying the hard links.

    That is all great, but the problem is that in this snapshot, if I change a file, it changes in all snapshots. What I would like instead is to have the system create a new file on-change and link to that instead. That way each snapshot would not become invalid on an edit of the first file.

    How can I achieve that?

    p.s. I tried rsync -a --delete --link-dest=../backup.1 source_directory/ backup.0/, but it has the same problem.

  • coder
    coder about 11 years
    What do you recommend as a way to back up a massive folder?
  • derobert
    derobert about 11 years
    @HermannIngjaldsson well, it depends on how you do your backups. Personally, I'd just add it to my Bacula setup—but I wouldn't recommend that unless you have a bunch of machines to back up, or already know Bacula. So, I guess I'd suggest you try rsnapshot first.
  • developerbmw
    developerbmw almost 9 years
    rsnapshot is good