Network bonding mode 802.3ad on Ubuntu 12.04 and a Cisco Switch

9,922

Solution 1

You'll never get more than 1 NIC's performance between two servers. Switches do not spread the frames from a single source across multiple links in a Link Aggregation Group (LAG). What they actually do is hash the source MAC or IP (or both) and use that hash to assign the client to one NIC.

So your server can transmit across as many NIC's as you want, but those frames will all be sent to the destination server on one link.

Solution 2

To test LAGs use multiple threads so they use multiple links. Using netperf try:

netperf -H ipaddress &
netperf -H ipaddress &
netperf -H ipaddress &
netperf -H ipaddress &
netperf -H ipaddress &

You should see some of the traffic hitting the other slaves in the bond.

I have four 10GbE ports in a LACP bond and I am getting 32Gb to 36Gb each way between the two servers.

The other way is to setup aliases on the bond with multiple IP addresses and then launch multiple netperf instances to the different addresses.

Your server with the Intel Xeon processors X5690 has more then enough power to drive close to 10Gb per core.

I have driven 80Gb uni-directional traffic across 8x1GbE ports. The key is using l3+l4 hashing on both the switch and NICs and to use multiple threads.

Here is an example of my 4x10GbE configuration... My interface config file:

#Ports that will be used for VXLAN Traffic in on Bond0
auto p4p1
auto p4p2
auto p6p1
auto p6p2

iface p4p1 inet manual
bond-master bond0

iface p4p2 inet manual
bond-master bond0

iface p6p1 inet manual
bond-master bond0

iface p6p2 inet manual
bond-master bond0

#Configure Bond0. Setup script will provide VXLAN VLAN configuration on bond0
auto bond0
iface bond0 inet manual
#address 10.3.100.60
#netmask 255.255.0.0
bond-mode 4
bond-slaves none
bond-lacp-rate 0
bond-ad-select 1
bond-miimon 100
bond-xmit_hash_policy 1

cat /proc/net/bonding/bond0

root@host2:~# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): bandwidth
Active Aggregator Info:
    Aggregator ID: 2
    Number of ports: 4
    Actor Key: 33
    Partner Key: 32768
    Partner Mac Address: 54:7f:ee:e3:01:41

Slave Interface: p6p1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:e2:ba:47:2b:e4
Aggregator ID: 2
Slave queue ID: 0

Slave Interface: p4p2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:e2:ba:47:2b:69
Aggregator ID: 2
Slave queue ID: 0

Slave Interface: p4p1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:e2:ba:47:2b:68
Aggregator ID: 2
Slave queue ID: 0

Slave Interface: p6p2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:e2:ba:47:2b:e5
Aggregator ID: 2
Slave queue ID: 0

Here is the result of running multiple instances of netperf:

root@host6:~# vnstat -i bond0.192 -l
Monitoring bond0.192...    (press CTRL-C to stop)

   rx:    36.83 Gbit/s 353202 p/s          tx:   162.40 Mbit/s 314535 p/s

bond0.192  /  traffic statistics

                           rx         |       tx
--------------------------------------+------------------
  bytes                   499.57 GiB  |        2.15 GiB
--------------------------------------+------------------
          max           36.90 Gbit/s  |   170.52 Mbit/s
      average           20.05 Gbit/s  |    86.38 Mbit/s
          min               0 kbit/s  |        0 kbit/s
--------------------------------------+------------------
  packets                   39060415  |        34965195
--------------------------------------+------------------
          max             369770 p/s  |      330146 p/s
      average             186891 p/s  |      167297 p/s
          min                  0 p/s  |           0 p/s
--------------------------------------+------------------
  time                  3.48 minutes

Hope this helps...

Solution 3

Sorry for posting this as an Answer. I am unable to add a comment on @longneck's answer, possibly due to lack of reputation...?

It IS possible to get more than 1 NIC's performance between two servers, because switches are capable of distributing traffic based not only on MAC/IP, but also on port numbers. Cisco devices are well capable of doing this, but you may have to configure the switch to look at the L4 ports rather than just the L2 and L3 addresses, which may be the default.

The real reason why you probably won't get more than 1 NIC's performance between the two servers is because 1Gbps bidirectional is A LOT of traffic for any modern CPU to handle. I do not know how grunty your servers are, but if the servers are doing meaningful things with each packet that it receives, then I would be surprised if the servers can handle 1Gbps full duplex.

Sorry didn't mean to step on @longneck's answer above, just wanted to clarify a few additional points.

Share:
9,922

Related videos on Youtube

drivard
Author by

drivard

Updated on September 18, 2022

Comments

  • drivard
    drivard almost 2 years

    I am trying to team 3 network cards together on 2 servers. I am trying to achieve a maximum throughput of 3Gbps to replicate data between the servers. The setup is simple, I have 2 servers with 3 Gigabit network card connected on the same Cisco switch. Exactly on port 1-2-3 for server-1 and port 4-5-6 for server-2. My interfaces configuration looks like:

    auto lo
    iface lo inet loopback
    
    # The primary network interface
    auto eth0
    iface eth0 inet manual
            bond-master bond0
    
    auto eth1
    iface eth1 inet manual
            bond-master bond0
    
    auto eth2
    iface eth2 inet manual
            bond-master bond0
    
    auto bond0
    iface bond0 inet static
            address 192.168.1.11
            netmask 255.255.255.0
            gateway 192.168.1.1
    
            bond-miimon 100
            bond-mode 802.3ad
            #bond-downdelay 200
            #bond-updelay 200
            bond-lacp-rate 1
            # tried bond with slaves and no slaves interfaces
            bond-slaves eth0 eth1 eth2 
            # bond-slaves none
    

    I tried multiple configuration on these card but I always end up using only 1 network card at the time.

    I tested the performance with iperf and netcat

    # server-1
    iperf -s
    
    # server-2 
    iperf -c 192.168.1.10
    
    # Wait for trafic
    nc.traditional -l -p 5000 | pv > /dev/null 
    
    # Push trafic
    dd if=/dev/zero | pv | nc.traditional 192.168.1.11 5000
    

    We also tried many configuration on the Cisco switch, without port-channel and with port-channel and always only 1 network card used at the time. If we test individually each card they work at 1Gbps.

    I can also say that in /proc/net/bonding/bond0 the mode shows 802.3ad and the LACP rate shows FAST. I have no link count failure and the 3 interfaces show up. I also verify each eth interface with ethtool and they look fine to me.

    I was following this guide to set it up https://help.ubuntu.com/community/UbuntuBonding and I enabled the bonding module in the kernel with modprobe bonding and when I use lsmod to verify if the bonding module is up, yes it is in the list.

    What are we missing to get this working?

  • Thomas G
    Thomas G over 11 years
    You can verify it is working as @longneck says by running multiple data pushes to separate hosts.
  • drivard
    drivard over 11 years
    The servers have Intel(R) Xeon(R) CPU X5690 @ 3.47GHz with 156GB RAM. The problem is that when I do a maintenance on these DB servers transferring the data from the master to the slave takes for ever because it weights 1.4TB and I would like to make it faster to transfer. Thanks we will investigate the L4 on Cisco in our lab.
  • wookie919
    wookie919 over 11 years
    If, when you start the maintenance, only one stream is opened for the entire transfer, then 1Gbps will be the maximum. Hashing on L4 ports is of course only meaningful when there are multiple L4 ports to work with. Also are the storage devices capable of reading/writing at 1Gbps? Mechanical HDDs will definitely not cut it regardless of the RAID configuration you use...
  • drivard
    drivard over 11 years
    The HDDs are HP 600GB 10K/rpm, I think that in theory they should, but we finally just decided that for now will do with the 3 hours data transfer and we will investigate in a better way to archive the data so the transfer size would then become smaller. Because even if they invest into a 10Gbps network card, it is faster then the HDD speed. Regards.
  • Michael Hampton
    Michael Hampton over 10 years
    Welcome to Server Fault. We're really not fans of signatures here, so I've removed it. Feel free to add your name and any other information you want to make public to your user profile.