Bonding of two NIC does not work/go up

5,075
[...] Is bond1.167 ready and a bonding interface ?

The answer is no, it's a VLAN sub-interface, which hints at the issue: configuration.

Your configuration should define the bond1 interface, then the bond1.167 interface which depends on the former. You attempted to short-circuit the configuration, leading to trying to apply bonding settings on a vlan interface, as the log your gave tell.

UPDATE: there's an other problem: if enp4s0 and eno1's definitions reference bond1 before its definition (reference: with bond-master bond1), it appears the configuration scripts from the package ifenslave don't cope correctly. After having done a few tests, I can see one can pick one of these work-arounds:

  1. remove auto from those interfaces. Doing so will bring the actual interfaces up (when bond1 is configured) but ifupdown will still consider them not logically brought up.
  2. leaving them at start but with no reference to bond1 in the interfaces (by removing bond-master bond1). Doing so will break the bonding configuration later if one brings a physical interface down with ifdown then up again.
  3. move their definition after bond1 definition, keeping bond-master bond1 (and still keeping bond-slaves eno1 enp4s0 in bond1 definition). I couldn't see any drawback left with this method, except the configuration order, so I felt compeled to choose it in the end.

Perhaps ifupdown2 can cope with this better than the original ifupdown:

It is capable of detecting network interface dependencies

Or even using network-manager even if perhaps a bit overkill for a server.

Try this configuration instead (change the address):

auto lo
iface lo inet loopback

iface enp3s0 inet manual

auto bond1
iface bond1 inet manual
    bond-slaves eno1 enp4s0
    bond-miimon 100
    bond-mode 802.3ad
    bond-lacp-rate 1

auto enp4s0
iface enp4s0 inet manual
    bond-master bond1

auto eno1    
iface eno1 inet manual
    bond-master bond1

auto bond1.167
iface bond1.167 inet static
    address 192.0.2.2
    netmask 255.255.255.248        
    vlan-raw-device bond1

Which then gives here (using an LXC Debian 10 container to test):

# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:1b:21:3a:6f:fb
Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 1
    Actor Key: 15
    Partner Key: 1
    Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eno1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:3a:6f:fb
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: monitoring
Partner Churn State: monitoring
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 00:1b:21:3a:6f:fb
    port key: 15
    port priority: 255
    port number: 1
    port state: 79
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: enp4s0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:3a:6f:f9
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: monitoring
Partner Churn State: monitoring
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 00:1b:21:3a:6f:fb
    port key: 15
    port priority: 255
    port number: 2
    port state: 71
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

# cat /proc/net/vlan/bond1.167 
bond1.167  VID: 167  REORDER_HDR: 1  dev->priv_flags: 1021
         total frames received            0
          total bytes received            0
      Broadcast/Multicast Rcvd            0

      total frames transmitted           12
       total bytes transmitted          976
Device: bond1
INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
 EGRESS priority mappings: 
Share:
5,075

Related videos on Youtube

vega
Author by

vega

Updated on September 18, 2022

Comments

  • vega
    vega almost 2 years

    I have a Linux server with two NIC that are connected to a switch (and one NIC for management) and I want to use combine them and use LACP there, but for some unknown reason to me the bonding will just not work/go up.

    It also ignores the LACP configuration and goes into round-robin mode.

    Huawei switch configuration:

    interface Eth-Trunk10
    description #### Server ####
    port link-type trunk
    port trunk allow-pass vlan 167
    stp disable
    mode lacp
    load-balance src-dst-mac
    

    /etc/network/interfaces:

    auto lo
    iface lo inet loopback
    
    iface enp3s0 inet manual
    
    auto enp4s0
    iface enp4s0 inet manual
        bond-master bond1
    
    auto eno1
    iface eno1 inet manual
        bond-master bond1
    
    auto bond1
    iface bond1 inet manual
        bond-slaves eno1 enp4s0
        bond-miimon 100
        bond-mode 802.3ad
        bond-lacp-rate 1
    
    auto bond1.167
    iface bond1.167 inet static
        address x.x.x.x
        netmask 255.255.255.248
        vlan-raw-device bond1
    
    auto vmbr0
    iface vmbr0 inet static
        address  a.a.a.b
        netmask  255.255.255.248
        gateway a.a.a.a
        bridge-ports enp3s0
        bridge-stp off
        bridge-fd 0
    

    /proc/net/bonding/bond1:

    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    
    Bonding Mode: load balancing (round-robin)
    MII Status: down
    MII Polling Interval (ms): 100
    Up Delay (ms): 0
    Down Delay (ms): 0
    
    Slave Interface: enp4s0
    MII Status: down
    Speed: Unknown
    Duplex: Unknown
    Link Failure Count: 0
    Permanent HW addr: 00:1b:21:3a:6f:f9
    Slave queue ID: 0
    
    Slave Interface: eno1
    MII Status: down
    Speed: Unknown
    Duplex: Unknown
    Link Failure Count: 0
    Permanent HW addr: 00:1b:21:3a:6f:fb
    Slave queue ID: 0
    

    Network status:

    ● networking.service - Raise network interfaces
       Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
       Active: active (exited) since Tue 2019-09-24 19:33:18 CEST; 13s ago
         Docs: man:interfaces(5)
      Process: 16974 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)
     Main PID: 16974 (code=exited, status=0/SUCCESS)
    
    Sep 24 19:33:18 rakete systemd[1]: Starting Raise network interfaces...
    Sep 24 19:33:18 rakete ifup[16974]: /etc/network/if-pre-up.d/ifenslave: 47: echo: echo: I/O error
    Sep 24 19:33:18 rakete ifup[16974]: /etc/network/if-pre-up.d/ifenslave: 47: echo: echo: I/O error
    Sep 24 19:33:18 rakete ifup[16974]: Waiting for vmbr0 to get ready (MAXWAIT is 2 seconds).
    Sep 24 19:33:18 rakete systemd[1]: Started Raise network interfaces.
    

    lsmod | grep bond:

    bonding               159744  0
    

    I could not find anything helpful on these error messages. Maybe someone here has some experience with the bonding feature in Linux?

    Update, ip lookup:

    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master mgmt state UP mode DEFAULT group default qlen 1000
    link/ether b4:2e:99:3d:68:64 brd ff:ff:ff:ff:ff:ff
    3: enp4s0: <BROADCAST,MULTICAST,SLAVE> mtu 1500 qdisc pfifo_fast master bond1 state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:1b:21:3a:6f:f9 brd ff:ff:ff:ff:ff:ff
    4: eno1: <BROADCAST,MULTICAST,SLAVE> mtu 1500 qdisc pfifo_fast master bond1 state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:1b:21:3a:6f:f9 brd ff:ff:ff:ff:ff:ff
    5: bond1: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:1b:21:3a:6f:f9 brd ff:ff:ff:ff:ff:ff
    6: bond1.167@bond1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether 00:1b:21:3a:6f:f9 brd ff:ff:ff:ff:ff:ff
    7: mgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether b4:2e:99:3d:68:64 brd ff:ff:ff:ff:ff:ff
    
  • vega
    vega almost 5 years
    Thank you very much. I am just about to try this out, as soon as I am having remote access to the box again.
  • vega
    vega almost 5 years
    You were right, the cascading/shortcut was wrong and bond1 is now up and shown as active LACP connection. One question though: bond1.167 still does not appear as an interface, when using the "ip a" or "ping -I bond1.167 8.8.8.8" commands. It says, this interface does not exist. Do I have to create another interface/bridge like for example "auto vmbr0" and then use "bridge-ports bond1.167" and also put the IP address there?
  • vega
    vega almost 5 years
    I added the ip lookup output.
  • A.B
    A.B almost 5 years
    From what you added: both enp4s0 and eno1 interfaces are down, even if I don't know why. They are either down from the network scripts, in such case running ifup enp4s0; ifup eno1 will bring them up, or down from the low level layer (while the configuration scripts "think" they ran), in such case ip link set enp4s0 up; ip link set eno1 up will work. Does it stay not working after a reboot?
  • vega
    vega almost 5 years
    Alright, this did something (not the reboot, but the ip link set ... commands). I will alter the initial post newer information.
  • vega
    vega almost 5 years
    What I do not understand: eno1 and enp4s0 are just not starting by themselves after a reboot. I always have to do this: ip link set enp4s0 up ; ip link set eno1 up.
  • A.B
    A.B almost 5 years
    I could reproduce the issue, and probably guessed what causes it. I updated my answer (and the configuration to use instead) accordingly.
  • vega
    vega almost 5 years
    Thank you very much, sir. I will test this later on, too, but I have a hunch, you are quite right about that.