AWS elastic load balancer basic issues

amazon-ec2 amazon-web-services amazon-elb

5,313

Solution 1

Micro instances are not designed for sustained load. They allow bursting CPU, but after a short period of time (think 15-30 seconds) at heavy load they will be severely capped.

Try it with a small instance, at least, if you want any sort of useful benchmark.

Solution 2

Check the load on the single servers. ELB does not balance equally the traffic to all instances when this is coming from a single IP (like in the AB test case): it just switches from one instance to another. The final load then cannot be the double of a single instance, but on average it is in any case better than directing all the traffic to one single instance (due to reduce load and faster response)

Solution 3

Make sure you haven't accidentally selected sticky load balancing. This would cause the same user to be directed to the same instance.

Micro instances weren't designed to sustain heavy load. They are for CPU bursting. I can assure you though that micro instances work fine with elastic load balances.

Don't forget there may be other ways to increase the traffic your website can cope with. Eg. Varnish

5,313

Chris J

iOS Engineer (Obj-C & Swift) - Mobile Apps, Startup Veteran, and Cyber Security.

Updated on September 18, 2022

Comments

Chris J over 1 year
I have an array of EC2 t1.micro instances behind a load balancer and each node can manage ~100 concurrent users before it starts to get wonky.

i would THINK if i have 2 such instances it would allow my network to manage 200 concurrent users... apparently not. When i really slam the server (blitz.io) with a full 275 concurrents, it behaves the same as if there is just one node. it goes from 400ms response time to 1.6 seconds (which for a single t1.micro is expected, but not 6).

So the question is, am i simply not doing something right or is ELB effectively worthless? Anyone have some wisdom on this?
```
AB logs:
Loadbalancer (3x m1.medium)
Document Path:          /ping/index.html
Document Length:        185 bytes

Concurrency Level:      100
Time taken for tests:   11.668 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Non-2xx responses:      50001
Total transferred:      19850397 bytes
HTML transferred:       9250185 bytes
Requests per second:    4285.10 [#/sec] (mean)
Time per request:       23.337 [ms] (mean)
Time per request:       0.233 [ms] (mean, across all concurrent requests)
Transfer rate:          1661.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    2   4.3      2      63
Processing:     2   21  15.1     19     302
Waiting:        2   21  15.0     19     261
Total:          3   23  15.7     21     304

Single instance (1x m1.medium direct connection)

Document Path:          /ping/index.html
Document Length:        185 bytes

Concurrency Level:      100
Time taken for tests:   9.597 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Non-2xx responses:      50001
Total transferred:      19850397 bytes
HTML transferred:       9250185 bytes
Requests per second:    5210.19 [#/sec] (mean)
Time per request:       19.193 [ms] (mean)
Time per request:       0.192 [ms] (mean, across all concurrent requests)
Transfer rate:          2020.01 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    9 128.9      3    3010
Processing:     1   10   8.7      9     141
Waiting:        1    9   8.7      8     140
Total:          2   19 129.0     12    3020
```
- khoxsey over 11 years
  
  "Worthless" is a bit strong, don't you think? Plenty of firms run sites with hundreds or thousands of instances fronted by ELB.
- Chris J over 11 years
  
  i know! thats why i'm saying, I'm clearly doing something wrong here. I cant figure out why the numbers aren't supporting the concept. I have 6 instances behind a load balancer... but the numbers show the same data as 1 instance alone.
- khoxsey over 11 years
  
  Yes, you are not reading the documentation. Micros are not intended for significant load, Amazon is quite clear about this. You are consuming your "fair use" budget and getting clamped. Your 6 micros are costing you 12c/hr, replace them with a c1.medium at 16.5/hr and retest.
- Chris J over 11 years
  
  I have read the documentation, and I don't believe I'm getting hit with cpu throttling.
- khoxsey over 11 years
  
  Understood. An easy way to verify that you are having ELB problems is to replace the ELB with a suitable instance (let's say m1.small for now) and start up Apache in reverse-proxy mode. If you can get the desired throughput from your micro array, then it is clearly ELB and I would be very interested to see the ab output posted here, it would be incredibly helpful.
- Chris J over 11 years
  
  i did a standard 100 concurrent ab test, and it sort of looks like the single instance is doing a bit better, no?
- ceejayoz over 11 years
  
  What does a longer test than 10 seconds show (i.e. several minutes)? ELB takes a while to ramp up depending on traffic history.
khoxsey over 11 years

+1 for identifying the oxymoron: load-balanced micros are the jumbo shrimp of the cloud world. For cheap capability, switch to load-balanced c1.mediums (5 ECU per).
Chris J over 11 years

My understanding is that an array of 6 t1.micro instances SHOULD handle 6 times the load a single instance can handle, but this isn't happening in practice - isn't that what's supposed to happen?
ceejayoz over 11 years

No, they will not necessarily handle six times the load. Try it with a set of small/medium instances and see.
Chris J over 11 years

Yea thats basically what i'm seeing too. Not sure why this is. I know that netflix uses AWS and so it can't be a joke service, but it doesn't seem to be working here.
Logic Wreck over 11 years

Actually I was going up from like 20 requests/second, did many tests until got to 250 requests/second and got this high latency. So in general you're very wrong here. Also this was being done with m1.large instances, not t1.micro in my case. Please read carefully next time.
Logic Wreck over 11 years

Like already said, at 250 reqs/second it's rather the ELB then the instance, thus upgrading the instance from micro to small will likely not help although can be tried, but I have serious doubts again.
ceejayoz over 11 years

While you're instructing me to read carefully, please read my posts on your answer and tell me where I said you were using a micro instance?
bwight over 11 years

The problem is that the load balancers need time to scale up. Not every load balancer is equal. If you go from 0 to 250 requests / second in 0 seconds then the load balancer will not be configured to handle that load. Usually takes 5-10 minutes for the load balancer to complete the upgrade. However, like ceejayoz said, the micro instances are horrible you're not going to get very far stress testing micro instances. I can easily handle 3000+ request/second with 1 load balancer without any special configuration.
ceejayoz over 11 years

I believe blitz.io is supposed to use many IPs.
WooDzu about 10 years

This answer should be the winner I think. If you ab test from a single ip address/range ELB will stick the connections to single EC2 instance regardles whether you're using sticky-sessions OR NOT. I would like to know how to workaround this.