Backup - rsync or tar

14,489

Solution 1

Definitely rsync.

The advantage of rsync is that it will copy only the files which have changed.

If you have 100GB+ of relatively small files, you don't want to copy them all each time.

Note: the first backup with rsync will be slow because all files are copied. Subsequently only the changed files are copied, and they can be compressed during the copy.

Be sure to familiarise yourself with all the options of rsync ... there are many.

Tar is an archive utility. You could conceivably create a tar file for the entire 100GB+, but you don't want to transfer it all, each time.

Solution 2

I would like to add that, although in general I agree with pavium's reply and I would choose rsync, there are options in tar for incremental backups. From man:

-g, --listed-incremental F create/list/extract new GNU-format incremental backup

   -G, --incremental
      create/list/extract old GNU-format incremental backup

EDIT: Following a recent comment, I will further expand on how both backups work:

tar initially creates a large file, possibly compressed (-g gzip flag) with all backed up files. Then each incremental backup creates a new file only with the modified files, in which it also specifies which of them have been deleted.

rsync on the other hand initially keeps a second mirror directory with the exact tree and files of the source directory, uncompressed. Then with every incremental backup (-B flag), it continues to have a mirror copy of the source, keeping in another directory by date all changed files (both modified and deleted).

Therefore, one can understand that each method has its plus and minus. A tar backup is more difficult to be maintained in a medium with limited capacity, as it happens with the classic incremental method. rsync is not considered a classic backup solution. It requires more disk space for the mirror, since it is uncompressed. It requires more time to reconstruct a full backup of a previous date.

UPDATE: Since Mar 2016 a newer alternative came up: borg backup. I very strongly recommend it. It uses the 'deduplicating' method. More information on the link provided above.

Solution 3

rsync can be somewhat painful if you have a very large number of files - especially if your rsync version is lower than 3. On the other hand: if you use tar, you would generate a very big resulting tar-file (unless the data may be compressed a lot). Personally, I would look at rdiff-backup, but make sure that you test your restore situation: rdiff-backup can be very memory demanding when restoring.

Solution 4

if your files do not change much - i would vote for rsync.

Solution 5

Do you need history (multiple backups) or just a plain copy of your data to some other disk? Backing up 100GB of 10KB files would take ages if you don't use a block level backup. Think about making block level snapshots or some other block level based solution, if you really need a fast solution.

Share:
14,489

Related videos on Youtube

Admin
Author by

Admin

Updated on September 17, 2022

Comments

  • Admin
    Admin almost 2 years

    We're looking to backup about 100gb+ of data containing small files (10kb+) each. The backup needs to be done as fast as possible to another harddrive weekly. Which is the better (especially speed wise) way to backup in such scenario? Rsync, or tar?

    • dasdasd
      dasdasd over 11 years
      Information about the files would be interesting. Are the existing ones static and only new ones are added or are all files prone to changes?
  • Christopher Batey
    Christopher Batey over 14 years
    Do not need a history, just a plain copy of data to a secondary harddisk mounted on the server. Any suggestioon on a faster solution?
  • pfo
    pfo over 14 years
    ``dd=/dev/sdX of=/dev/sdY'' in a cron job should be the fastest solution since it's block level copy if sdX being copied to sdY. Benchmark that against a tar'ed or rsync'd copy.
  • Svish
    Svish over 14 years
    Can you take a block level backup of just certain folders?
  • daff
    daff over 14 years
    Block-level backups by design only work for entire filesystems, not single directories. This is especially true for simple solutions like "dd if=/dev/foo of=/dev/bar" but AFAIK also for the more advanced Snapshot-based products from NetApp, EMC and the like.
  • Tonny
    Tonny over 11 years
    For the millionth time: RAID IS NOT BACKUP
  • Tonny
    Tonny over 11 years
    I did read and comprehend your answer. I am fully familiar with the in's and out's of BTRFS. (Hell, I've written parts of BTRFS.) You can use BTRFS with a combination of snapshots and RAID to achieve something that would act as a backup, but it is finicky, uses experimental features and IN GENERAL it is not a solution for a generic backup problem. I stand by my statement: RAID by itself is not backup. RAID+snapshot can be backup, but it is very hard to do it right. Just snapshot+blockbases image-copy of the snapshot is a much simpler approach to the posters question.
  • Yuan
    Yuan about 11 years
    If computer strikes by thunder or fire, will your raid 1 backup surrive?
  • dasdasd
    dasdasd about 11 years
    @Tonny: Since btrfs was only one of 2 provided options, I still do not understand the yelling and the downvote. Swapping out images from a softraid is done in a handfull of lines of script.
  • Nicolas Schmidt
    Nicolas Schmidt about 5 years
    Why would you choose rsync over tar if both have incremental backups?
  • Wtower
    Wtower about 5 years
    I hope the recent edit covers your question.
  • asa
    asa about 3 years
    why would you "definitely" use rsync if you could use tar.gz and combine it with find to create incremental compressed backups?
  • TheTechRobo Stands for Ukraine
    TheTechRobo Stands for Ukraine almost 3 years
    @asa more work i assume