How to load balance sftp instances on AWS

7,420

My comment could probably use some clarification. I spouted off with the eloquence of an inebriated yak:

I have never wanted to set myself on fire so much as I do now.

Why? Why would I say such a thing? Mostly because I'm an awful person. However, aside from that, I can explain my outburst by going over the original post piecemeal:

I like to know is it possible to load balance sftp servers in AWS.

Yes. Impossible is nothing. But know that unless you get a special SFTP package, the load balancing will be entirely up to you to build. The service being SFTP and being hosted in AWS is inconsequential.

I have 2 servers, and each of my servers are using s3fs-fuse to mount the same S3 bucket onto a mount point. Both of my ec2 instances are able to read/write to their mount points, and from S3, I can see the files from both servers.

You're off to a good start with a shared file system, the performance and reliability of the setup notwithstanding.

As for my next step, I like to know how can I load balance my sftp servers, so that when a user connects to a specific IP address, it will redirect them to one of my sftp servers.

The question is now: Why do you want to load balance. There is a fantastic amount of throughput and processing power afforded to the Amazon instance catalog and the need to load balance SFTP would mean you're approaching porn levels of network activity. Keep it simple, repeatable, and resilient wherever possible. Get an i2.xlarge with an SFTP daemon running on it and you should be fine no matter what. Build it with Puppet/Chef/$trendy-config-management-tool and you're in business. Moving on however...

I took a look at elastic load balancers, but they seem to only permit specific ports. I have also investigated HAProxy, but I am unsure how secure that solution will be.

HAproxy is exactly the kind of tool you need. Your uncertainty about security is easily dispelled with just a few hours of reading. My desire to self immolate is rising from this point on. If you're unsure about something, go become sure about it. HAProxy is the choice for many financial institutions, hospitals, and governments.

I have to take HIPAA compliance into consideration.

Totally understood, but compliance is not primarily the role of tools. You'll need to understand the concepts behind the HIPAA compliance requirements, and see how HAproxy can fulfill them. HAProxy is neither HIPAA compliant nor HIPAA non-compliant. No matter which tool you use, you'll need to independently verify the underlying assumptions and requirements of your compliance and regulatory needs. In fact, if anything, S3 and the use of Amazon instances should be inspected more carefully than the use of HAproxy.

The load balancer must be a static ip address as our vendors does not support DNS hostnames

This. This did it. Your vendor is bad and should feel bad. Now I want to jump into lava. Not supporting something basic like DNS resolution is entirely unrelated, but also it's like saying "A car must have an engine for me to use it." Well of course. Of course a load balancer is going to have the ability to use a static IP address. There are many more considerations that you need to be thinking about above simple static IP addresses.

TL;DR

Yes you can load balance SFTP with HAproxy. HIPAA compliance is up to you to discern and tool choice will not check boxes. You have some Googling to do and documentation to read.

I have some flames to put out.

Share:
7,420

Related videos on Youtube

popopanda
Author by

popopanda

Updated on September 18, 2022

Comments

  • popopanda
    popopanda over 1 year

    I like to know is it possible to load balance sftp servers in AWS. I have 2 servers, and each of my servers are using s3fs-fuse to mount the same S3 bucket onto a mount point. Both of my ec2 instances are able to read/write to their mount points, and from S3, I can see the files from both servers.

    What I am looking for is having SFTP to transfer files and using Amazon S3 to store my files. Files would be uploaded and download daily.

    https://github.com/s3fs-fuse/s3fs-fuse

    As for my next step, I like to know how can I load balance my sftp servers, so that when a user connects to a specific IP address, it will redirect them to one of my sftp servers. I took a look at elastic load balancers, but they seem to only permit specific ports. I have also investigated HAProxy, but I am unsure how secure that solution will be. I have to take HIPAA compliance into consideration. The load balancer must be a static IP address as our vendors does not support DNS hostnames.

    • Admin
      Admin over 8 years
      I have never wanted to set myself on fire so much as I do now.
    • Admin
      Admin over 8 years
      TBH, using s3fs-fuse for PHI seems quite foolish.
    • Admin
      Admin over 8 years
      For the record, ELBs do nowadays support all ports (1-65535): aws.amazon.com/blogs/aws/…. But, ELBs also require clients to use the AWS-generated DNS name (which also points to two public IP addresses, which can change).
    • Admin
      Admin over 8 years
      Also, did you consider asking your vendor to support encrypted S3 uploads? Not hard at all…
    • Admin
      Admin about 7 years
      I'm not sure what folks like @w00t are so surprised about -- there's plenty of reasons for wanting to load balance an SFTP server, mainly for HA and SPoF reasons. For some businesses (Health care!) SFTP is the defacto way of moving info and if your SFTP goes down you're screwed. If you've gone to immutable infrastructure (and you should) then you'll need a way to keep the service running while you swap in new machines.
    • Admin
      Admin about 7 years
      @rusty in this design there is still a SPoF in Haproxy. You need to failover an IP to fix that, and then you can just as well run a hot standby.
    • Admin
      Admin about 7 years
      @w00t Not commenting on a design, just surprised at the reactions I see from the concept of the question. Re: HAProxy the OP says they were investigating it, not settled on it.
  • popopanda
    popopanda over 8 years
    Thanks for the honest feedback and suggestions. I will look through them, but this will give me a start.
  • ceejayoz
    ceejayoz over 8 years
    This is my favorite answer on SF in quite some time. <3
  • Jukka
    Jukka over 8 years
    Netscaler also has an SFTP mode for virtual servers, might want to consider that as well. An EC2-instance-backed Netscaler appliance (with different bandwidth options) is available through the AWS marketplace. Costs money though.
  • Andrew Domaszek
    Andrew Domaszek over 8 years
    After working in healthcare IT for near on a decade, 3rd party vendors being unable to support dns lookups, using domains that are not inter-system accessible, and other levels of odd which fly in the face of accepted standards are not particularly surprising and occur much more often than anyone would like. For example, several years ago, a vendor told me they couldn't support ssh public key authentication because it didn't use passwords that could be rotated every 30 days. HIPAA seems to cause a certain amount of hyper-paranoia and confusion. I'm surprised that AWS is willing to be a BA.