CentOS 6 interfaces bonding, round-robin instead of active-backup, duplicates frames

7,260

After some deeper investigation, found the problem for the round-robin and DUP problems. They are actually related.

  • round robin (0) instead of active-backup (1)

On CentOS 5+, and seemingly especially 6.6, it recommended / preferred to use the BONDING_OPTS parameter directly in ifcfg-bond0 (and not in the bonding module options, that makes sense)

DEVICE=bond0
...
BONDING_OPTS="mode=1 miimon=100"

(mode may be specified as '1' or as 'active-backup')
After adding the line, everything worked as expected.

  • duplicated ping frames

In round-robin mode, both interfaces are used. And when the interfaces are connected to two different switches, the early ping replies may be duplicated

It is not uncommon to observe a short burst of duplicated traffic when the bonding device is first used, or after it has been idle for some period of time. This is most easily observed by issuing a "ping" to some other host on the network, and noticing that the output from ping flags duplicates (typically one per slave).

For example, on a bond in active-backup mode with five slaves all connected to one switch, the output may appear as follows:

    # ping -n 10.0.4.2
    PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data.
    64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms
    64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)

This is not due to an error in the bonding driver, rather, it is a side effect of how many switches update their MAC forwarding tables.

After switching to active-backup, no more DUPs were observed.

This is explained in details in this invaluably knowledgeable documentation

https://www.kernel.org/doc/Documentation/networking/bonding.txt

Share:
7,260

Related videos on Youtube

Déjà vu
Author by

Déjà vu

[email protected] Linux & Mac.

Updated on September 18, 2022

Comments

  • Déjà vu
    Déjà vu almost 2 years

    Two interfaces, eth0 and eth1 are part of a network bonding bond0 on CentOS 6.

    All worked well under CentOS 5, but after the upgrade to CentOS 6.6, keeping the same configuration, the network works fine but

    • despite setting /etc/modprobe.d/bonding.conf with options mode=1 or even mode=active-backup, the status from /proc/net/bonding/bond0 always shows load balancing (round-robin), not active-backup as it should.

    • doing a ping to a LAN address (that belongs to bond0 network) for the first time after a reboot, the first frame is DUP! (duplicated), the DUP doesn't happen anymore on further pings. Likely due to round-robin instead of active-backup

    /etc/modprobe.d/bonding.conf:

    alias bond0 bonding
    options bond0 mode=1 miimon=100
    

    ifcfg-bond0:

    DEVICE=bond0
    BOOTPROTO=none
    ONBOOT=yes
    NETWORK=10.1.1.0
    NETMASK=255.255.255.0
    IPADDR=10.1.1.11
    USERCTL=no
    NM_CONTROLLED=no
    

    ifcfg-eth0:

    DEVICE=eth0
    BOOTPROTO=none
    HWADDR=00:22:35:12:26:18
    UUID=12fa32c2-e421-47f6-8d25-11414a664318
    TYPE=Ethernet
    ONBOOT=yes
    NM_CONTROLLED=no
    MASTER=bond0
    SLAVE=yes
    USERCTL=no
    

    ifcfg-eth1:

    DEVICE=eth1
    BOOTPROTO=none
    HWADDR=00:22:35:12:26:19
    UUID=12fa32c2-e421-47f6-8d25-11414a664319
    TYPE=Ethernet
    ONBOOT=yes
    NM_CONTROLLED=no
    MASTER=bond0
    SLAVE=yes
    USERCTL=no
    

    All updates have been applied. NetworkManager is disabled.

    The main problem seems now to be the mode, round-robin instead of active-backup.

    • Bratchley
      Bratchley over 9 years
      It might also be worth it to post the stack trace somehow so we can see what threads are involved in the panic.
    • Déjà vu
      Déjà vu over 9 years
      @Bratchley actually a number of updates + BIOS seem to have fixed the kernel panic. However, whatever the options in bonding.conf, ie mode=1 or mode=active-backup, the status from /proc/net/bonding/bond0 always shows load balancing (round-robin). I'll edit the question.
    • Bratchley
      Bratchley over 9 years
      It might be worth it to post that as a new question since it doesn't relate to the original issue with kernel panics or the DUP message. That would get more eyes on the problem since people see new questions before they see updated questions.
    • Déjà vu
      Déjà vu over 9 years
      @Bratchley Added an answer that explains what happened. Thanks.