Redundant NFS on Amazon EC2

linux ubuntu amazon-ec2 redundancy fault-tolerance

5,212

On AWS, using GlusterFS with an Elastic Load Balancer and auto scaling EC2 instances should achieve what you want. I can't comment about any other IaaS.

Amazon does provide some of what you need to achieve your objective - and allows you to implement the rest.

Amazon's EC2 servers are essentially VPSes - you can setup Heartbeat/Corosync/Pacemaker, etc on them (although last time I checked, you cannot use broadcast on their network - you can use unicast though - udpu).

You mention two ideas which Amazon addresses (somewhat) separately: fault tolerance and redundancy.

There is no built in mechanism for redundancy on EC2, although depending on what you are looking for, there are some ways to achieve it.

Theoretically, S3 is designed with multiple layers of redudancy and "designed to provide 99.999999999% durability of objects over a given year". Their SLA is for 99.9% availability per year. If you want to go that route for static files, you can mount an S3 bucket using s3fuse as a local file system. This is fairly slow however, and not really advisable for most purposes (code, databases, server software, etc).
EBS snapshots will provide you with a compressed, differential point-in-time image of your EBS volume. These are great as a backup - and you can launch new instances from a snapshot - they are not, however true redundancy.
For any solution of actual redundancy, you must set it up yourself. One approach designed for this problem is GlusterFS. You can setup your bricks as distributed, replicated, or both, and data will be spread across the system - it is resilient to the removal of individual nodes, and they have a pre-built AMI that you can launch multiple instances from to build a cluster.

Fault tolerance, on the other hand, is better provided for by the Amazon platform:

The EC2 network offers multiple regions and availability zones - which (theoretically) provide isolated and/or geographically separated data centres to avoid single points of failure
Amazon offers monitoring (Cloudwatch) of a variety of instance metrics (CPU, network, disk I/O, etc), as well as custom metrics. These can be used as a trigger for launching new instances from a pre-built AMI, a process called 'Auto scaling'.
EC2 has Elastic IP addresses - these are public IP addresses that can be reserved and quickly remapped to another instance on demand, allowing you to avoid the delays of DNS propagation when an instance goes down.
Finally, Amazon has Elastic Load balancers - these are supposed to be designed to avoid a single point of failure, and to scale with incoming traffic (they do not suffer from the same bandwidth limitations that a single instance setup as a load balancer would be subject to). ELBs are able to monitor the 'health' of the back end instances, and work with auto scaling to maintain an appropriate number of instances.

In addition to the above, you can pass custom parameters to your newly launched instances, or retrieve information about your currently running instances fairly easily - which may allow you to script some of the setup (and, of course, AWS does have an API that will let you script all the actions they offer - including remapping an elastic IP address, launching new instances, detaching/attaching EBS volumes, etc).

You described 'files are kept on a separate, redundant EBS...[which is then] mounted'. Firstly, on EC2, an EBS volume can only be attached to one instance at a time (so to copy data to it, the EBS volume would need to be attached). It is up to you to maintain redundancy (you can setup RAID arrays of EBS devices, or do pretty much anything else). The problem though, is that sometimes EBS volumes are not detached when an instance actually crashes - you can force detach them though (which has a better, but not perfect success rate), and you can snapshot a EBS volume, even in use (which you could then create a new EBS volume from and launch an AMI using). It is better (lower time to recover, more flexible, etc) though, to maintain replicas of your data across multiple instances, as opposed to across multiple EBS volumes on the same instance.

5,212

Trent Scott

Updated on September 18, 2022

Comments

Trent Scott over 1 year

I'm interested in building two fault tolerant/redundant NFS servers with failover at Amazon EC2. I'm familiar with tools/technologies like DRBD, Heartbeat, etc. Does Amazon provide any specific way of achieving this through their platform?

A suitable example might be that files are kept on a separate, redundant EBS -- if a failure occurs, a new instance is automatically launched from a pre-built AMI, the EBS volume is mounted, and the IP address is transitioned seamlessly.

Is this possible? Are there better platforms than Amazon? Can you give me a broad idea of the underlying architecture we're talking about to pull this off?
Bryan Mills over 12 years

Not too familiar with S3 so excuse my ignorance but I'm curious: If s3 is so popular with fast website caching then why would s3fuse be slow?
cyberx86 over 12 years

a) It's usually better to use CloudFront as a CDN. b) There are a number of problems with S3 as a file system i) you have to change the full file - you can't append or change one byte, etc - which in itself rules out most uses. ii) It is 'eventually consistent' - you can write something, and even if confirmed, if you read it back right away, you may not get the expected result iii) lower throughput than EBS (typically about 50% max) iv) higher latency than EBS v) less flexible (permissions, quotas, etc). These aren't issues when serving a remote user, but are for a local file system.
ceejayoz over 10 years

I believe S3 guarantees 11 nines of durability, not availability. Your files may at times be unavailable but they're exceedingly unlikely to be permanently lost.
cyberx86 over 10 years

@ceejayoz: You are absolutely right. It is 99.9% availability and 11 nines of durability. Thanks. Fixed.
ceejayoz over 10 years

"You don't even need to pay for the NFS server instances" is a pretty silly thing to say. You pay for the service, which pays for the instances, right?