How many reverse proxies (nginx, haproxy) is too many?

13,245

Solution 1

From a purely performance perspective, let benchmarking make these decisions for you rather than assuming -- using a tool like httperf is invaluable when making architecture changes.

From an architectural philosophy perspective, I'm a little curious why you have both nginx and apache on the application servers. Nginx blazes at static content and efficiently handles most backend frameworks/technologies (Rails, PHP via FastFCGI, etc), so I would drop the final Apache layer. Once again, this comes from a limited understanding of the technologies that you're using, so you may have a need for it that I'm not anticipating (but if that's the case, you could always drop nginx on the app servers and just use apache -- it's not THAT bad at static content when configured properly).

Currently, I use nginx -> haproxy on load balancing servers and nginx on the app servers with much success. As Willy Tarreau stated, nginx and haproxy are a very fast combination, so I wouldn't worry about the speed of having both on the front-end, but keep in mind that adding additional layers increases complexity as well as the number of points of failure.

Solution 2

Your setup is more and more common. You don't have to worry. Both nginx and haproxy are extremely fast to process and forward HTTP requests. Both combine very well together and do their job very well. No need to choose. Install them both and be happy. That way you will deliver static files very quickly and also ensure smooth scaling of your dynamic servers.

Don't worry for the number of proxies. The problem is often "can I use a proxy". Sometimes it's not practical. If you can have one, you can have two or three. Many complex architectures involve up to 5-6 levels of proxies and still scale very well. You should just be careful about one thing : do not install more of such proxies on a single machine than this machine has of CPU cores, or the proxies will have to share their CPU time under high loads, which will increase response times. But for this to happen with nginx and haproxy on a machine, this would mean loads of tens of thousands of requests per second, which is not everyone's problem of the day.

Also, avoid mixing single-threaded proxies with massively multi-threaded/multi-process software such as apache, java, etc on the same system.

Once you take these rules into account, simply draw the architecture that suits your needs, put names on the boxes, combine them in a sane way and install them.

Solution 3

Remember complexity can be just as much (if not more) of an impediment to scaling as code/design. As you scatter your implimentation details across more and more services and config-files you create something that is more difficult to scale out, has more of a learning curve to anyone new to the team, requires more software/packages to manage, complicates troubleshooting with more potential failure points, etc. Setting up a 4-proxy-layer stack for a site that would have been fine with just-apache or just-nginx is basically the sysadmin version of "premature optimization".

Solution 4

Why not use Varnish? That way you combine caching, proxying and loadbalancing into one application and it's a hell of a lot neater from an architecture point of view.
The scale-out scaling is pretty phenomenal. There's also the advantage that the load balancer can make more intelligent decisions based on the actual health of the nodes.

The configuration file will allow you to examine the headers and make decisions about where to serve static and dynamic content from.

If you're really predicting to be serving a LOT of static content, perhaps shunting most of that onto a CDN would be a cost effective solution?

Solution 5

Update for 2020, this is the standard recommended setup that you should encounter for PHP applications.

LOAD BALANCER              SERVER A
   haproxy   -+--------->   nginx   -+---> /app/ ------> PHP application
              |                      |
              |                      +---> /static/ ---> local files
              |              
              |
              |            SERVER B
              +--------->   nginx   -+---> /app/ ------> PHP application
                                     |
                                     +---> /static/ ---> local files

This serves the application and the static files from multiple hosts, each host is identical. nginx can run PHP applications (over fastcgi) and serve local files.

If it's an old PHP application it might be designed with Apache in mind in place of nginx. It's fine. Apache too can run PHP applications (cgi, fastcgi or mod_php) and serve local files.

It's possible to migrate from Apache to nginx but that's not necessarily worthwhile for legacy systems. Apache is a pain to build and configure but if it's already running it's not an issue. Apache is not performant but PHP is much slower and the bottleneck will always be the app.

In front, this needs a layer of load balancers to distribute traffic between the servers. Usually one of HAProxy, AWS ALB, F5 or CloudFlare.

It's possible to add an infinite amount of layers in-between but it gives no benefits. The original question is outdated in its concerns. All these software have been supporting SSL for almost a decade, the setup can run SSL end-to-end. Local files are served efficiently thanks to sendfile(), added to the Linux kernel a while ago.

Share:
13,245

Related videos on Youtube

Tom O'Connor
Author by

Tom O'Connor

You can contact me by email for consultancy and similar requests, on [email protected] if you so desire. If you've got a question, or require linux consultancy, don't hesitate to get in touch. Here's a snippet from my CV: Good leadership skills and able to efficiently work alone or as part of a multi-disciplinary team. Extensive linux, networking and virtualization knowledge in parallel with windows desktop and server administration experience. Programming: Python, Django, Java, Perl, C, C++, Qt, MySQL, Postgres, XHTML, CSS, Javascript, Linux/UNIX shell scripting, SVN, CVS, Bazaar, Hudson, Selenium Applications: Adobe Creative Suite, Microsoft Office, Blackberry Enterprise Server, Microsoft Exchange, OpenOffice, Pro/Desktop CAD and electronic circuit design packages such as Proteus ISIS Operating Systems: Ubuntu and Debian Linux, VMware ESX & ESXi, XenServer, KVM, Microsoft Windows 7/Vista/XP/2000 and Server 2008, Apple OS X, Solaris, and other Linux/UNIX distributions. Network Technologies: iSCSI SAN, GlusterFS, Cisco IOS, Monitoring with Nagios, Munin and Zabbix, MySQL replication, Message Queues (RabbitMQ), High Availability & Scalable Network Architecture

Updated on September 17, 2022

Comments

  • Tom O'Connor
    Tom O'Connor over 1 year

    I'm setting up a HA (high availability) cluster using nginx, haproxy & apache.

    I've been reading great things about nginx and haproxy. People tend to choose one or the other but I like both. Haproxy is more flexible for load balancing than nginx's simple round robin (even with the upstream-fair patch). But I'd like to keep nginx for redirecting non-https to https among other things right at the point of entry to the cluster.

    On the other hand, nginx is a lot faster for serving static contents and would reduce the load on the powerful apache which loves to eat a lot of RAM!

    Here is my planned setup:

    Load balancer: nginx listens on port 80/443 and proxy_forwards to haproxy on 8080 on the same server to load balance between the multiple nodes.

    Nodes: nginx on the node listens to requests coming from haproxy on 8080, if the content is static, serve it. But if it's a backend script (in my case PHP), proxy forward to apache2 on the same node server listenning on a different port number.

    Technically this setup works but my concerns are whether having the requests going through several proxies is going to slow down requests? Most of the requests will be PHP requests as the backends are services (which means groing from nginx -> haproxy -> nginx -> apache).

    Thoughts? Cheers

  • Tom Anderson
    Tom Anderson almost 13 years
    Varnish doesn't do HTTPS, so you would still need something in front of it to terminate that. Nginx would be a good choice. stunnel or stud could also do it; i don't know how their performance compares to Nginx's, and they don't send an X-Forwarded-For header.
  • Tom Anderson
    Tom Anderson almost 13 years
    I'd be very interested to read a comparison of HAProxy and Varnish for load-balancing. I'd like to be able to make an informed choice between {Nginx,stud}/Varnish/HAProxy and {Nginx,stud}/Varnish.
  • Tom O'Connor
    Tom O'Connor almost 13 years
    @Tom Anderson, If you can wait, I could probably write one. Might take me a couple of weeks.
  • Tom Anderson
    Tom Anderson almost 13 years
    A kind offer! I should add that i am not facing this choice right now, but it's something i am conscious i don't have the information to make sensible choices about in the future. This seems to be a really tough area to benchmark.
  • Henrik
    Henrik almost 12 years
    No comparison yet? It's been some weeks :)