Linux bonding: 802.3ad (LACP) vs. balance-alb mode

38,805

Solution 1

In balance-alb, both sending and receiving frames are load balanced using the change MAC address trick. This might cause issues at application levels. Not all applications are matured for this mode.

To Handle your original issue. Here is what I used to do.

  • Leave the switch ports to default.
  • Perform pxe-Kickstart installation.
  • Either at KS post installation level or at your management infrastructure "puppet/chef" change the configuration of the switch ports to LACP "Assuming there is trusted server on your network by all the devices"

Chakri -

Solution 2

I'm not terribly familiar with Juniper switches, but you shouldn't have to configure LACP on them; that is the point of LACP. If this isn't the case, something is wrong with your switch configuration.

LACP only specifies a protocol for dynamically aggregating ports. It does not specify a port scheduling policy (where traffic is sent and received). This policy is set separately. I don't remember the process in Linux, but I know Linux supports specifying at couple different policies, probably similar to balance-alb.

The balance-alb has specific disadvantages. Mainly that it semi-intelligently selects an outgoing port for new connections, and they're stuck to that one port for the life of the connection (it's actually done by MAC, not port, if a port fails the MAC gets assigned to another port, thus allowing the connection to continue).

This doesn't exactly "aggregate" the ports however, as connections will not be able to utilize more than one port. So if you've got 2 1GbE ports, a single connection is still limited to 1GbE. LACP resolves this usually, though it depends on your scheduling policy and the number of active ports supported at each end.

Solution 3

If you've set LACP on the ports where your boxes connect to use LACP, the only "correct" setting on the host side is to use LACP. The EX will balance according to Ethernet source and destination MACs for ethernet traffic and will consider IP source/destination/port for IP traffic if you have IP packets on your frames. Please consider reading Juniper KB22943 for the details of the hashing algorithms. If your switch supports cross-stack LACP (which is the case for 4XXX EXs) go with LACP if you have a stack. It can also be easier to debug in case you have a more complex L2 topology with per VLAN loops etc.

Solution 4

LACP is great when it works and provides pretty much double the performance of a single NIC. If you got only a small number of machines with bonded NIC's, go for it.

But, one of the drawback's with it is if you are on a bit of a budget and so therefore using lower end switches, they tend to lack sufficient LACP groups and no MLAG or SMLT features. As a minimum, most switches from HP and similar seem to offer only as many bonding groups as there are half as many ports. Some offer even less. A 2k supermicro switch we were using at one point only had 8 LACP groups despite having 52 ports. I'm guessing this number is relatively arbitrary. No one thought you'd need more than 1/2 the number of switch ports. It's probably just a hard coded number in the firmware and probably takes up a little more memory.

But, this really is a huge limitation if you use SR-IOV, bonding and virtual machines.

If you're a provider who wants to host maybe hundreds or thousands of machines in a rack you don't necessarily want to be spending tens of thousands of dollars on a high end switch that's important but unnecessarily expensive just to provide redundancy and performance for a single rack of machines. I can see why companies like facebook want to create their own switches.

So in this type of scenario, I'd go with a different mode, perhaps balance-alb.

Share:
38,805

Related videos on Youtube

sagi
Author by

sagi

Updated on September 18, 2022

Comments

  • sagi
    sagi over 1 year

    Here's the situation. I would like to connect my Linux servers to a single network using dual link for fault tolerance and load balancing reasons. The servers have 2 or more 1-gig NICs and I plan to connect each of them to a different switch that reside in a single stack (i.e. a single virtual switch). All switches are Juniper EX4200 or EX4500.

    I know I can use any of the Linux bonding modes and I wonder what is the best one. Historically I used the active-backup mode because some servers were connecting to non-stacked switches but now we have a new and consistent network and I would like to take use a bonding mode that offers load balancing in addition to fault tolerance.

    I thought the best mode to use is 802.3ad (LACP) because that's the standard being used on all network equipment, but as it turns out the moment I configure a set of ports as an LACP channel on the switch side the connection breaks until I also configure the server side properly. This makes our system administration tasks much harder because before installing a new server we must remove the LACP configuration on the switch (because things like PXE boot and network installation do not work on LACP ports), and after the installation we need must change the switch settings again but only after the server was configured to use LACP, or the connection will die.

    Other bonding modes such as balance-alb do not require any special configuration on the switch side while on paper provide the same advantages.

    Is there any reason to choose 802.3ad instead of balance-alb?

    • pfo
      pfo about 12 years
      Why is passive LACP a problem for deploying the machine? The installation image that you've spun up via PXE won't use LACP, the installed system can use LACP if needed, as the bond can be created after the installed system was booted.
    • sagi
      sagi about 12 years
      If the switch side is configured to use LACP then PXE doesn't work. The LACP configuration for the port must be removed first.
    • pfo
      pfo about 12 years
      check this link about how to get LACP and PXE boot working together on a EX series switch: broken.net/openindiana/…
    • sagi
      sagi about 12 years
      already tried that. the force-up option kills redundancy - it makes the switch always send traffic to the forced interface, even if it is physically down.
    • cronfy
      cronfy over 10 years
      sagi, some time has passed since you asked the question. Have you had any luck with balance-alb?
  • sagi
    sagi about 12 years
    Unfortunately it seems like on Juniper switches automatic LACP is not supported: juniper.net/techpubs/en_US/junos10.0/information-products/…: "The JUNOS implementation of LACP provides link monitoring but not automatic addition and deletion of links.".
  • Philip
    Philip about 12 years
    Wow, that's really horribly messed up.
  • cronfy
    cronfy over 10 years
    Chakri, can you please be more specific about applications that may fail with balance-alb?