Bonding + Bridge : traffic passing through the wrong interface

6,498

After more searching, it appears the root of my problem is what is discussed here :

Serverfault : arp responses always go out a sinle NIC

As this question is much more precise and has several answers, that's where you should look if you have the same problem than me

Share:
6,498

Related videos on Youtube

iodbh
Author by

iodbh

Swipe right if you like Python

Updated on September 18, 2022

Comments

  • iodbh
    iodbh over 1 year

    I'm using Ubuntu server 12.04, on a server with 6 NICs, grouped in 2 bundles :

    • eth0 and eth2 are bundled using bonding mode 1, under interface name bond0 which has IP [network].8
    • eth1, eth3, eth4 and eth5 are bundled using bonding mode 4 (802.3ad), under interface name bond1

    bond1 will be used to connect VMs to our network : it is bridged through br0, which has IP [network].

    Now, when I ping [network].5 from our network, everything seems to works, but our VMs have no network access.

    After poking around for a while, I noticed that br0's IP ([network].5) is associated with bond1's MAC address, i.e.:

    arping <[network].5>
    

    returns

    Unicast reply from <[network].5> [<bond0's MAC address>]  0.710ms
    

    Also, while I'm pinging [network].5 :

    tcpdump -i br0 icmp
    

    Shows no ICMP traffic,

    tcpdump -i bond1
    

    show no traffic either, but

    tcpdump -i bond0
    

    shows the ICMP packets I'm sending using ping.

    It's pretty obvious that packets are sent down the wrong tube. My question here is : why is it so and how can I fix that ?

    Here is the content of my /etc/network/interfaces file :

    # bond0 part :
    
    auto eth0
        iface eth0 inet manual
        bond-master bond0
    
    auto eth2
        iface eth2 inet manual
        bond-master bond0
    
    auto bond0
        iface bond0 inet static
        address [network].8
        gateway [network].254
        netmask 255.255.254.0
        # bonding mode 1 :
        bond-mode balance-rr
        bond-slaves none
    
    auto eth1
        iface eth1 inet manual
        bond-master bond1
    
    auto eth3
        iface eth3 inet manual
        bond-master bond1
    
    # bond1 and br0 part :
    
    auto eth4
        iface eth4 inet manual
        bond-master bond1
    
    auto eth5
        iface eth5 inet manual
        bond-master bond1
    
    auto bond1
        iface bond1 inet manual
        # bonding mode 4 :
        bond-mode 802.3ad
        bond-slaves none
        bond-miimon 100
        bond-downdelay 200
        bond-updelay 200        
        bond_xmit_hash_policy layer2
        bond_lacp_rate fast
    
    auto br0
        iface br0 inet static
        address [network].5
        netmask 255.255.254.0
        gateway [network].254
        bridge_ports bond1
        bridge_maxwait 5
        bridge_stp off
        bridge_fd 0
    

    Please note that :

    • 802.3ad link aggregation has been configured on the swich side
    • We have verified muliple times that the right ports are connected
    • The same issue occurs on 2 servers with the exact same hardware+software configuration

    [EDIT] After several reboots, the opposite happens : bond0 is associated with bond1's mac address. This seems to happen randomly. When it does, VMs behind the bridge have access to our network and Internet.