What are the differences between channel bonding modes in Linux?
Solution 1
The biggest factor in fail-over is the speed with which a link failure is detected. Unplug the cable from the host and they'll all work pretty well. Leave a live link on an otherwise dead switch and most of the modes (except for those that support beacons/keepalives) are going to send part of your traffic nowhere.
Generally speaking network traffic is interrupt driven. The various hashing algorithms aren't going to make a meaningful difference.
Any mode that isn't active/standby or broadcast-all will share traffic to varying degrees. Some modes can balance on a per packet basis, others work on a per-flow basis. The former will more evenly spread load while the latter is far more useful (read: functional/stable) in actual networks.
Yes - there are limitations to each mode, but we need to know a lot more about your application to speak to them.
Only LACP/802.3ad (mode 4) explicitly requires support on the switch. That said, just because you send to the switch with a particular pattern doesn't mean the switch will send -back- to you in the same manner.
The only mode I tend to trust in production is 802.3ad which, with an appropriately configured switch, will assure that only the correct links will end up in the channel as well as providing some measure of symmetry in traffic sharing and a predictable response when a link is down. This mode also avoids some common-but-nasty problems (i.e. unicast flooding). Active/standby is also quite common. The other modes may be required for certain circumstances but, IMO, tend to be more painful.
Other flow/MAC/IP based balancing modes or active/standby can be fine, too, and may be required when dealing with unmanaged switches.
Solution 2
Most of these points are quite thoroughly described in the /usr/src/linux/Documentation/networking/bonding.txt
documentation file from the linux source package of your favorite distro. Speed of failover is controlled by the "miimon" parameter for most modes, but shouldn't be set too low; normal values are under one second anyway.
Here are the best parts, completed by me:
balance-rr or 0
Round-robin policy: Transmit packets in sequential
order from the first available slave through the
last. This mode provides load balancing and fault
tolerance.
active-backup or 1
Active-backup policy: Only one slave in the bond is
active. A different slave becomes active if, and only
if, the active slave fails. The bond's MAC address is
externally visible on only one port (network adapter)
to avoid confusing the switch.
This mode provides fault tolerance. The "primary"
option affects the behavior of this mode.
balance-xor or 2
XOR policy: Transmit based on the selected transmit
hash policy. The default policy is a simple [(source
MAC address XOR'd with destination MAC address) modulo
slave count]. Alternate transmit policies may be
selected via the xmit_hash_policy option.
This mode provides load balancing and fault tolerance.
broadcast or 3
Broadcast policy: transmits everything on all slave
interfaces. This mode provides fault tolerance.
802.3ad or 4
IEEE 802.3ad Dynamic link aggregation. Creates
aggregation groups that share the same speed and
duplex settings. Utilizes all slaves in the active
aggregator according to the 802.3ad specification.
Slave selection for outgoing traffic is done according
to the transmit hash policy, which may be changed from
the default simple XOR policy via the xmit_hash_policy
option. Note that not all transmit policies may be 802.3ad
compliant, particularly inregards to the packet mis-ordering
requirements of section 43.2.4 of the 802.3ad standard.
Differing peer implementations will have varying tolerances for
noncompliance.
Note: Most switches will require some type of configuration
to enable 802.3ad mode.
balance-tlb or 5
Adaptive transmit load balancing: channel bonding that
does not require any special switch support. The
outgoing traffic is distributed according to the
current load (computed relative to the speed) on each
slave. Incoming traffic is received by the current
slave. If the receiving slave fails, another slave
takes over the MAC address of the failed receiving
slave.
balance-alb or 6
Adaptive load balancing: includes balance-tlb plus
receive load balancing (rlb) for IPV4 traffic, and
does not require any special switch support.
When a link is reconnected or a new slave joins the
bond the receive traffic is redistributed among all
active slaves in the bond by initiating ARP Replies
with the selected MAC address to each of the
clients. The updelay parameter must
be set to a value equal or greater than the switch's
forwarding delay so that the ARP Replies sent to the
peers will not be blocked by the switch.
balance-rr, active-backup, balance-tlb and balance-alb don't need switch support.
balance-rr augments performance at the price of fragmentation, performs poorly with some protocols (CIFS) and with more than 2 interfaces.
balance-alb and balance-tlb may not work properly with all switches; there are often some arp problems (some machines may fail to connect to each other for instance). You may need to tweak various settings (miimon, updelay) to get stable networking.
balance-xor may or may not require switch configuration. You need to set up an interface group (not LACP) on HP and Cisco switches, but apparently it's not necessary on D-Link, Netgear and Fujitsu switches.
802.3ad absolutely requires an LACP group on the switch side. It's the best supported option overall for augmenting performance.
Note: whatever you do, one network connection always go through one and only one physical link. So when aggregating GigE interfaces, a file transfer from machine A to machine B can't top 1 gigabit/s, even if each machine has 4 aggregated GigE interfaces (whatever the bonding mode in use).
Solution 3
The kernel docs answer some of those questions:
Related videos on Youtube
hookenz
Updated on September 18, 2022Comments
-
hookenz almost 2 years
Under Linux you can combine multiple network interfaces into a "bonded" network interface to provide failover.
But there are several modes, some of which do not require switch support. I'm not constrained in my switch in that I can use any of the modes.
However, in reading about the different modes it's not immediately clear what the pros and cons of each one.
- Do some modes provide a faster failover?
- What about CPU load impact for each mode?
- Which modes can combine the bandwidth rather than just provide redundancy?
- Are there limitations to that?
- Does balance-rr require switch support?
- Reliability? What are your experiences running long term?
-
the-wabbit over 11 yearsyou have read the Kernel bonding howto, haven't you? It should answer your questions.
-
hookenz over 11 yearsYes, it tells you what they do to a degree. But it doesn't tell you how well they perform in a production environment. Some say "no switch support required". While others have no comment and leave you guessing.
-
hookenz over 11 yearsNot sure why the downvote. This is a perfectly valid question and on topic isn't it?
-
hookenz over 11 yearsWhoever did the down vote and/or the close vote please provide a reason so that I can at least get a chance to improve the question.
-
Zoredache over 11 yearsI suspect the down-vote, might because the way your question is asked makes it seems like you didn't du much research or reading of the documentation before you asked it. See the down-vote tool-tip.
-
the-wabbit over 11 years@Matt in this case, you probably have not read it well enough. Take a look at sections 11, 12 and 13 - you will find a comprehensive discussion for a number of scenarios examining failover characteristics and performance.
-
pgoetz over 7 years@the-wabbit: Matt posting the question alerted me to the existence of the Kernel bonding howto, thanks to your reference; so I'm upvoting as a useful question (to me).
-
Zoredache over 11 yearsPlease spend some time improving this answer beyond just providing a link. The link is useful, but answers here should be more then just a link to the documentation.
-
hookenz over 11 yearsThanks for that link. Section 12.1.1 MT Bonding Mode Selection for Single Switch Topology is what I was after.
-
FINESEC over 11 yearsYeah, that's why I only pasted the link without any comment ;-)
-
the-wabbit over 11 yearsNote that 802.3ad does not mandate the use of LACP. LACP is just a control protocol for dynamic link aggregation configuration, you can perfectly have a static LA setup without LACP.
-
wazoox over 11 yearsThat's true, however most switches don't allow static link aggregation configuration and call aggregates "LACP mode".
-
the-wabbit over 11 years802.3ad defines both - static LA and LACP. If a switch claims to comply with 802.3ad, it needs to implement both. Every switch model I had at my hands which implemented LACP, implemented static LA as well.