Using RAID1 vs. rsync pros and cons

10,548

Solution 1

You are comparing two different things.

Rsync is a file-copying tool. It can be used for backup purposes.

Raid arrays are used to get higher availability and prevent system down time because of HD failures. This is different from backup using any other tool.

Backup is keeping your data in a different place (preferably on different machine/location) to get them back when needed. Raid is for high availability and should keep your HD in sync to avoid data loss.

To make it clearer, you will not be able to get your file back (unless you use some specialized tools to recover deleted files) if you mistakenly deleted it and you don't have a recent backup even if you have an active raid array.

Solution 2

RAID-1 pros:

  • possibly faster reads when multiple accesses happen simultaneously
  • availability in case of failure of one drive (i.e no downtime)

RAID-1 cons:

  • corruption on filesystem level or accidental rm -rf can theoretically ruin whole raid instantly
  • more complex, will need LVM, mdadm etc.

rsync pros:

  • simple solution, once a day do a rsync of data in background
  • more flexibility. when running out of space just stop rsyncing part of data

rsync cons:

  • degraded performance of NAS during backup
  • need to monitor consistency of data at least from time to time

Possibly there could be a way to use inotify to circumvent degraded NAS performance during backup since it would be done instantly and from cache instead of reading from 1st HDD.

Share:
10,548

Related videos on Youtube

Stan
Author by

Stan

Updated on September 18, 2022

Comments

  • Stan
    Stan almost 2 years

    I am building myself a Linux NAS/home server. I am considering either using sw RAID-1 (mdadm) to replicate data or alternatively just rsync them periodically. What are advantages/disadvantages of both approaches. I am adding my assumption as an answer, but I'd like to make this list of pros/cons more comprehensive.

    Edit: I know they are different technologies for different purposes. And I know that people have to decide which is more important: reliable backups or availability or some other property of the solution. But there will be people who will be looking at both rsync and RAID and deciding between them. I wanted a list to point them to. I guess I misunderstand when downvoting questions is supposed to be used.

    • Admin
      Admin almost 12 years
      off topic but fyi I backup home server with rsync+ssh on raid1 remote disks (yes I waste a lot of space).
  • brain99
    brain99 almost 12 years
    Performance impact of rsync can probably be prevented/reduced using ionice.
  • MDMarra
    MDMarra almost 12 years
    "instant "backup" of data on both drives" - no no no no no. RAID is not a backup in any sense. You can restore a deleted file from a backup, you can't from RAID.
  • Stan
    Stan almost 12 years
    @MDMarra that part was regarding rsync not RAID
  • MDMarra
    MDMarra almost 12 years
    No it isn't. Read your own answer. "RAID-1 pros: * possibly faster reads when multiple accesses happen simultaneously * instant "backup" of data on both drives"
  • Stan
    Stan almost 12 years
    I know that but if you re-read my question I am looking for a list of pros and cons of both solutions. I know they are two different horses and my own answer already contained this (well...I wrote "corruption on filesystem level" but generic idea is the same in both cases.
  • Khaled
    Khaled almost 12 years
    @Stan: No, the idea is not the same. They are different things for different purposes.
  • Stan
    Stan almost 12 years
    Ah, that's why the backup was in quotes. I'll just rewrite that statement
  • MDMarra
    MDMarra almost 12 years
    @Stan You need to decide which is more important. The availability of data that a RAID 1 provides or the capability to restore accidentally deleted data that rsync provides. No one can answer this for you. You can't do an apples-to-apples "pros and cons" list, since you're not comparing apples to apples.
  • Khaled
    Khaled almost 12 years
    @Stan: I am using both: raid-1 and a backup tool that eventually using rsync. Raid-1 guarantees that single HD failure will not bring my server down and backup protects me against invalid file deletion and alteration.
  • Stan
    Stan almost 12 years
    @Khaled I agree that is the best (even if most expensive) solution. Your answer is actually correct and I like it even if I'd probably like it better if it was written more in "if you need X don't use RAID but just rsync because: [list]"
  • Alex Berry
    Alex Berry almost 12 years
    RAID 1 utilising MDADM does not allow for parallel file reads on each disk, so will not increase read performance, the second disk is just a mirror and the MDx device is recognised only as a standard block device, no special parallel read configuration in standard implementations.
  • Stan
    Stan almost 12 years
    @AlexBerry Then I'd like to see problem with this benchmark: freebsdwiki.net/index.php/… Notice that RAID1 configuration has almost linear read improvements with each added disk
  • Alex Berry
    Alex Berry almost 12 years
    Apologies, as early as last year it wasn't doing so for me, they must have made some improvements,
  • Alex Berry
    Alex Berry almost 12 years
    linux.co.uk/2010/08/double-your-raid1 From 2010, if there are improvements it must have been in the last year or so.
  • Stan
    Stan almost 12 years
    @AlexBerry Not sure what raid10 performance (or 1+0 for that matter) has to do with pure raid1, but note that even oldest revision of freebsd wiki with that graph has the same results and that's from 2007. ref: freebsdwiki.net/…
  • Alex Berry
    Alex Berry almost 12 years
    Quoted from that site "The problem with RAID1 (i.e. a pure two disk mirror) is that in addition to the resilience of having two identical disks, there is a tendency to think ‘great, two disks in parallel, twice the performance!’. Unfortunately, no so! Whereas RAID5 stripes data between disks to obtain a performance boost, effectively reading data in parallel from multiple drives, RAID1 does not and simply reads from one of it’s two drives."
  • Stan
    Stan almost 12 years
    @AlexBerry You can notice that author of that article used single dd for testing. As noted previously, the performance benefit is for parallel accesses (i.e. 2 dds on RAID1 would run at roughly double the speed compared to single drive)