CentOS 6 interfaces bonding, round-robin instead of active-backup, duplicates frames
After some deeper investigation, found the problem for the round-robin and DUP problems. They are actually related.
- round robin (0) instead of active-backup (1)
On CentOS 5+, and seemingly especially 6.6, it recommended / preferred to use the BONDING_OPTS
parameter directly in ifcfg-bond0
(and not in the bonding module options, that makes sense)
DEVICE=bond0
...
BONDING_OPTS="mode=1 miimon=100"
(mode may be specified as '1' or as 'active-backup')
After adding the line, everything worked as expected.
- duplicated ping frames
In round-robin mode, both interfaces are used. And when the interfaces are connected to two different switches, the early ping replies may be duplicated
It is not uncommon to observe a short burst of duplicated traffic when the bonding device is first used, or after it has been idle for some period of time. This is most easily observed by issuing a "ping" to some other host on the network, and noticing that the output from ping flags duplicates (typically one per slave).
For example, on a bond in active-backup mode with five slaves all connected to one switch, the output may appear as follows:
# ping -n 10.0.4.2
PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data.
64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms
64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
This is not due to an error in the bonding driver, rather, it is a side effect of how many switches update their MAC forwarding tables.
After switching to active-backup, no more DUPs were observed.
This is explained in details in this invaluably knowledgeable documentation
https://www.kernel.org/doc/Documentation/networking/bonding.txt
Related videos on Youtube
Comments
-
Déjà vu almost 2 years
Two interfaces,
eth0
andeth1
are part of a network bondingbond0
on CentOS 6.All worked well under CentOS 5, but after the upgrade to CentOS 6.6, keeping the same configuration, the network works fine but
despite setting
/etc/modprobe.d/bonding.conf
withoptions mode=1
or evenmode=active-backup
, the status from/proc/net/bonding/bond0
always showsload balancing
(round-robin), notactive-backup
as it should.doing a ping to a LAN address (that belongs to
bond0
network) for the first time after a reboot, the first frame isDUP!
(duplicated), the DUP doesn't happen anymore on further pings. Likely due to round-robin instead of active-backup
/etc/modprobe.d/bonding.conf:
alias bond0 bonding options bond0 mode=1 miimon=100
ifcfg-bond0:
DEVICE=bond0 BOOTPROTO=none ONBOOT=yes NETWORK=10.1.1.0 NETMASK=255.255.255.0 IPADDR=10.1.1.11 USERCTL=no NM_CONTROLLED=no
ifcfg-eth0:
DEVICE=eth0 BOOTPROTO=none HWADDR=00:22:35:12:26:18 UUID=12fa32c2-e421-47f6-8d25-11414a664318 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=no MASTER=bond0 SLAVE=yes USERCTL=no
ifcfg-eth1:
DEVICE=eth1 BOOTPROTO=none HWADDR=00:22:35:12:26:19 UUID=12fa32c2-e421-47f6-8d25-11414a664319 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=no MASTER=bond0 SLAVE=yes USERCTL=no
All updates have been applied. NetworkManager is disabled.
The main problem seems now to be the mode, round-robin instead of active-backup.
-
Bratchley over 9 yearsIt might also be worth it to post the stack trace somehow so we can see what threads are involved in the panic.
-
Déjà vu over 9 years@Bratchley actually a number of updates + BIOS seem to have fixed the kernel panic. However, whatever the options in
bonding.conf
, iemode=1
ormode=active-backup
, the status from/proc/net/bonding/bond0
always showsload balancing
(round-robin). I'll edit the question. -
Bratchley over 9 yearsIt might be worth it to post that as a new question since it doesn't relate to the original issue with kernel panics or the DUP message. That would get more eyes on the problem since people see new questions before they see updated questions.
-
Déjà vu over 9 years@Bratchley Added an answer that explains what happened. Thanks.