Entries I can safely exclude doing backups

9,726

Solution 1

First off, you should read up a little on rsync's include/exclude syntax. I get the feeling that what you want to do is better done using ** globs than * globs. (** expands to any number of entries, whereas * expands only to a single entry possibly matching multiple directory entries. The details are in man rsync under Include/Exclude Pattern Rules.)

That said, if you want to be able to restore the system to a known working state from the backup with a minimum of hassle, you should be careful with excluding files or directories. I use rsnapshot myself and have actually taken the opposite approach: include everything except for a few carefully selected directories.

So my rsnapshot.conf actually states (with tabs to make rsnapshot's configuration file parser happy):

interval backup NNN # pick your poison
one_fs 0
exclude /backup/**
exclude /dev/**
exclude /proc/**
exclude /run/**
exclude /sys/**
exclude /tmp/**
backup / ./

and very little else. Yes, it means I might copy a bit more than what is strictly needed, but it ensures that anything not intended as ephermal is copied. Because of rsnapshot using rsync's hardlink-to-deduplicate behavior, the only real cost to this is during the first run; after that, assuming you have a reasonably sized (compared to your total data set size) backup target location, it takes very little extra in either time or disk space. I exclude the contents of /backup because that's where I mount the backup target file system; not excluding it would lead to the situation of copying the backup into itself. However, for simplicity if I ever need to restore onto bare metal, I want to keep the mount point!

In my case I also cannot reasonably use one_fs 1; I run ZFS with currently ~40 file systems. Listing all of those explicitly would be a maintenance nightmare and make working with ZFS file systems a lot more involved than it needs to be.

Pretty much anything you want to exclude above and beyond the above is going to depend on the distribution, anyway, so it's virtually impossible to give a generic answer. That said, you're likely to find some candidates under /var.

Solution 2

Most of what you are trying to do can probably be accomplished simply by using the one_fs setting. Set the filesystems you want to include in your backups, then use that setting to ignore the rest (proc, sys, dev, etc.). I'd include /lost+found because that directory should always be empty unless you've backed-up a corrupted filesystem, in which case you probably want a backup of anything that fsck recovered. Also, .pyc and .pyo should not really be in the root directory in the first place, so I'd remove those lines too. /tmp and /var/tmp are about the only remaining paths on a "generic" system which contain data that can be reliably excluded from backups. So maybe try something like:

one_fs 1

exclude /tmp/
exclude /var/tmp/

Solution 3

I find it is better to have a package list, the contents of /etc, /home, and any user/system data from /var and elsewhere. It is usually faster to reinstall the packages and copy back the working config.

Share:
9,726

Related videos on Youtube

Paolo
Author by

Paolo

Please forgive my ignorance. Self reminders I'm here to learn. Learning is an experience, everything else is just information. (A.Einstein)

Updated on September 18, 2022

Comments

  • Paolo
    Paolo almost 2 years

    I'm planning a backup strategy based on rsnapshot.

    I want to do a full system backup excluding files and directories that would be useless for the restore to have a working system again. I already excluded:

    # System:
    exclude /dev/*
    exclude /proc/*
    exclude /sys/*
    exclude /tmp/*
    exclude /run/*
    exclude /mnt/*
    exclude /media/*
    exclude /lost+found
    
    # Application:
    exclude /*.pyc
    exclude /*.pyo
    

    I wonder which other entries I can add to the exclude list without compromising the restored system. Talking about a "generic" Linux system, can you suggest further glob extensions, temporary directories, caches, etc. I can exclude safely?

  • Paolo
    Paolo over 10 years
    I didn't really mean /*.pyc and /*.pyc but system wide *.pyc and *.pyo, I fixed that. I'm not sure if one_fs set to 1 might exclude anything I want, though.
  • depquid
    depquid over 10 years
    What if a system package uses such files?
  • Paolo
    Paolo over 10 years
    you are right, but I'm almost sure that every file .py will be recompiled automatically sooner or later.
  • depquid
    depquid over 10 years
    Perhaps, but on my system such files are installed by vendor packages. Which means that if the system is restored from backup, files that the package manager thinks are there will be missing. You asked about a solution for a "generic" Linux system, and I don't think it's safe to always assume that such files can be lost without causing problems.
  • Paolo
    Paolo over 10 years
    a thing worth noting I forgot to say in the Q. is that bind mounts should be excluded as well to avoid data duplication.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 10 years
    @Guandalino Definitely set one_fs 1, and add explicit include directives if you have more than one data filesystem (e.g. if you have a separate /home partition). Any mount point is a remote filesystem, a removable drive, system files, etc., none of which you should attempt to back up, except for a small, known set of filesystems (which may be just the root).
  • depquid
    depquid over 10 years
    Why would installing packages, which includes writing all system files as well as processing configuration and meta-data be faster than simply copying files?
  • Sean Perry
    Sean Perry over 10 years
    It has been my experience that when a real backup is needed you also find out that you had not properly stored and documented all of the bits about a system. Focusing instead on recreation rather than restoration makes it easier, faster, and more often done. Obviously YMMV.
  • Martin von Wittich
    Martin von Wittich over 9 years
    exclude /somepath/* is perfectly fine in this case; it excludes everything in /somepath/, just as expected. You don't need ** because there's no need to look deeper when everything in /somepath/ is already excluded.
  • Frank Kusters
    Frank Kusters over 9 years
    Or just use exclude /somepath and ignore these directories altogether - not just their contents.
  • user
    user over 9 years
    @spaceknarf That breaks mounting when you restore onto bare metal, because then the mount point doesn't exist.
  • topher217
    topher217 almost 4 years
    @aCVn I know this is years ago now, but do you still use this method and if so, can you comment on how you'd restore and verify such a backup? Would you simply copy over the most recent backup (e.g. alpha.0) to some empty Ext4 formatted partition or would you first need a working bootable bare-bones linux distro on which to copy/overwrite these files?