Selection of linux filesystem for snapshots - For file backups on a VM

6,717

It sounds like you've got all the basic options, but there is another option I think you should consider — more on that in a bit. You've got the two common enough filesystems that support snapshots (btrfs and ZFS) and device-mapper/LVM snapshots.

  • btrfs snapshots work similarly to the ZFS ones you're already familiar with; you run btrfs subvolume snapshot -r /mountpoint/data "/mountpoint/snapshots/$(date -Is)" or similar to make one, then it's visible under /mountpoint/snapshots/$(date -Is). You can also do the root of the filesystem (/mountpoint), which works properly. My experience with btrfs is that stable with this usage. It also supports trim, which (if everything else supports it — I've personally never used HyperV so can't say) will used but now freed space to be returned to your hypervisor's thin pool.

  • LVM (device-mapper) snapshots are different — they snapshot the block device. Traditional LVM snapshots cause performance loss (due to copy on write) which may or may not be a problem for backup use. There are also thin pool snapshots, which are newer and avoid that problem. Since they operate at the block device level, when you make a snapshot you'll be creating a new block device — which you'll then have to mount to access the snapshoted files.

With both methods you can keep snapshots as long as desired (disk space permitting), remove them in any order, etc. I'd also suggest considering rsync --inplace to reduce the snapshot size. Given the choice between them — I think they'll all work fine and you should probably pick whatever you/your team is familiar with.

The other approach: You're currently writing your own backup system. A lot of backup systems already exist, including ones intended to do space-efficient backups to a hard disk like this. Examples include BackupPC, Bacula/Bareos (more focused on tape, but does disk too), BorgBackup, restic, ZBackup, a bunch more. I'd recommend taking a look at the Arch Wiki's list of synchronization and backup programs.

Share:
6,717

Related videos on Youtube

maloitpro
Author by

maloitpro

Updated on September 18, 2022

Comments

  • maloitpro
    maloitpro over 1 year

    I'm trying to get a backup system functioning as efficiently as I can - most of the systems I need backed up are some flavor of Linux and we currently dump them to an Ubuntu 16.04.3 server and store them on the / disk. The Ubuntu VM is running within Hyper-V, and has a .vhdx for the root disk. The Ubuntu OS runs rsync to connect to each production server.

    Anyway, instead of storing them on the root disk, I'd like to store the backup files on a new disk and new filesystem that can operate with daily snapshots. I've created a 900GB volume in Hyper-V (thin-provisioned, currently) and attached to the VM. So currently the disk shows up to Ubuntu as /dev/sdd, unformatted, with 900GB of capacity.

    Looking for suggestions on how to support the following requirements:

    • Allow backups copied to the filesystem via rsync from a number of production servers totaling about 60GB
    • Allow basic volume or filesystem snapshots to run daily, such that we can retain about 7-10 days worth of backup file information. The deltas of the production files from the previous day usually total about 30-35GB
    • Allow simple reference (such as a simple mount point in Ubuntu) to any one of the backup snapshots, in case we need to retrieve a random file from X days ago
    • Auto remove snapshots older than 10 days.

    What I don't need:

    • Physical or RAID volume management - the new disk (the 900GB .vhdx) is already stored on a Windows Storage Spaces volume that handles physical disk anomalies
    • Scripts - that run to mount/unmount or merge snapshots - that are not vanilla to the filesystem's package.

    I've used ZFS in the past, in the form of NexentaStor, and that was pretty slick. Besides the RAID management, the snapshots taken were automatically available to me as: "/primary_volume/.zfs/snapshot_name" and it was pretty slick to go & grab a file from X days ago.

    Am I looking at a BTRFS implementation, or perhaps an LVM implementation here? Or are there other packaged, ready-to-fly solutions that will fill this void for me?