How to increase network bandwidth of AWS EC2 instance?

19,110

If you were limited by bandwidth, that graph would become flat when you hit the limit. Further, as others pointed out that is only 1 MB/s out and 3 MB/s in, and I can do more than that on a t2.micro to the external internet.

What is the system doing with each request? Here are a list of things I would look at, in order:

  • Threading: are there bottlenecks in your application where only one thread can access a resource? That would keep CPU use low but cause exactly the pattern you saw.
  • Bad concurrency patterns in your application or server. Load test and look for it getting slower and slower as connections increase, while doing nothing.
  • Individual CPU: is one CPU loaded to 100% while others are mostly idle? (with 30+ cores, a saturated CPU would only give you 3% CPU use). One saturated CPU + others idle usually means a concurrency issue, probably in connection handling.
  • What is memory use like? Are you using swap at all? (that is a VERY bad sign if so, and would cause the issue). If memory use is excessive, often session storage in memory or excessively sized handler thread pools are at fault.
  • Disk I/O or external network requests: are you reading or writing with each request? vmstat will tell if you are spending a long time waiting for I/O to be serviced. If that is the case, I'd look at logging before anything.
    • The c4.8xlarge instances use EBS only, if the storage is magnetic and you write to access logs, you get a few hundred writes per second. General purpose SSDs give you 3 IO/s per GB base, but can burst to 3000 until they run out of IO credits.
    • The OS will try to combine writes, but with thousands of concurrent

It's not impossible but very unlikely that you might be bottlenecked at the network layer with connection creation or packets-per-second, if your requests are very small.

Share:
19,110
Srini K
Author by

Srini K

Updated on June 11, 2022

Comments

  • Srini K
    Srini K almost 2 years

    We hosted a site in AWS EC2 of type c4.8xlarge. It is a fairly large system with lot of memory and compute resources. Thousands of users tried to access the system during a 2 hour timeframe this weekend. While it did not crash, it slowed down quite a bit and failed to perform at the expected level. Analyzing the stats showed that limited network bandwidth is the main cause of the slowdown. The CPU usage stayed below 6%, but NetworkIn and NetworkOut seem to have peaked at 60MB and 200MB respectively during that timeframe. While I'm not an networking expect, some reading online seemed to indicate that all the traffic going through one NIC could be the main cause of limited network bandwidth. Is this true? Would hosting the site on a different type of EC2 instance help increase the network bandwidth? Here is how the networkIn and networkOut metrics looked like under heavy load.

    networkIn and networkOut metrics chart

  • Srini K
    Srini K about 9 years
    Thanks. Any other thoughts based on my comments above?
  • stepanian
    stepanian over 8 years
    How would having several instances help if you still have to go through a load balancer with similar or even lower bandwidth limitations? (Assuming you would still use an ec2 instance as your load balancer with something like haproxy installed).
  • BobMcGee
    BobMcGee about 8 years
    While it's not hip, scale-up is a viable solution. This entire site and all of Stack Exchange runs on just 25 servers. They've stated that they can actually run with just a single web server, and their servers have specs very similar to the c4.8xlarge (but with better storage). I seriously doubt that they have hit a vertical scaling limit here, it's probably a configuration or code problem, not a hardware limitation.