Is it possible to have multiple default gateways for outbound connections?

69,113

Solution 1

Solved it myself. There seems to be very little information about the networking stuff that you can do with Linux, so I have decided to document and explain my solution in detail. This is my final setup:

  • 3 NICs: eth0 (wire), wlan0 (built-in wifi, weak), wlan1 (usb wifi adapter, stronger signal than wlan0)
  • All of them on a single subnet, each of them with their own IP address.
  • eth0 should be used for both incoming and outgoing traffic by default.
  • If eth0 fails then wlan1 should be used.
  • If wlan1 fails then wlan0 should be used.

First step: Create a new route table for every interface in /etc/iproute2/rt_tables. Let's call them rt1, rt2 and rt3

#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
#1  inr.ruhep
1 rt1
2 rt2
3 rt3

Second step: Network configuration in /etc/network/interfaces. This is the main part and I'll try to explain as much as I can:

auto eth0 wlan0
allow-hotplug wlan1

iface lo inet loopback

iface eth0 inet static
address 192.168.178.99
netmask 255.255.255.0
dns-nameserver 8.8.8.8 8.8.4.4
    post-up ip route add 192.168.178.0/24 dev eth0 src 192.168.178.99 table rt1
    post-up ip route add default via 192.168.178.1 dev eth0 table rt1
    post-up ip rule add from 192.168.178.99/32 table rt1
    post-up ip rule add to 192.168.178.99/32 table rt1
    post-up ip route add default via 192.168.178.1 metric 100 dev eth0
    post-down ip rule del from 0/0 to 0/0 table rt1
    post-down ip rule del from 0/0 to 0/0 table rt1

iface wlan0 inet static
wpa-conf /etc/wpa_supplicant.conf
wireless-essid xyz
address 192.168.178.97
netmask 255.255.255.0
dns-nameserver 8.8.8.8 8.8.4.4
    post-up ip route add 192.168.178.0/24 dev wlan0 src 192.168.178.97 table rt2
    post-up ip route add default via 192.168.178.1 dev wlan0 table rt2
    post-up ip rule add from 192.168.178.97/32 table rt2
    post-up ip rule add to 192.168.178.97/32 table rt2
    post-up ip route add default via 192.168.178.1 metric 102 dev wlan0
    post-down ip rule del from 0/0 to 0/0 table rt2
    post-down ip rule del from 0/0 to 0/0 table rt2

iface wlan1 inet static
wpa-conf /etc/wpa_supplicant.conf
wireless-essid xyz
address 192.168.178.98
netmask 255.255.255.0
dns-nameserver 8.8.8.8 8.8.4.4
    post-up ip route add 192.168.178.0/24 dev wlan1 src 192.168.178.98 table rt3
    post-up ip route add default via 192.168.178.1 dev wlan1 table rt3
    post-up ip rule add from 192.168.178.98/32 table rt3
    post-up ip rule add to 192.168.178.98/32 table rt3
    post-up ip route add default via 192.168.178.1 metric 101 dev wlan1
    post-down ip rule del from 0/0 to 0/0 table rt3
    post-down ip rule del from 0/0 to 0/0 table rt3

If you type ip rule show you should see the following:

0:  from all lookup local 
32756:  from all to 192.168.178.98 lookup rt3 
32757:  from 192.168.178.98 lookup rt3 
32758:  from all to 192.168.178.99 lookup rt1 
32759:  from 192.168.178.99 lookup rt1 
32762:  from all to 192.168.178.97 lookup rt2 
32763:  from 192.168.178.97 lookup rt2 
32766:  from all lookup main 
32767:  from all lookup default 

This tells us that traffic incoming or outgoing from the IP address "192.168.178.99" will use the rt1 route table. So far so good. But traffic that is locally generated (for example you want to ping or ssh from the machine to somewhere else) needs special treatment (see the big quote in the question).

The first four post-up lines in /etc/network/interfaces are straightforward and explanations can be found on the internet, the fifth and last post-up line is the one that makes magic happen:

post-up ip r add default via 192.168.178.1 metric 100 dev eth0

Note how we haven't specified a route-table for this post-up line. If you don't specify a route table, the information will be saved in the main route table that we saw in ip rule show. This post-up line puts a default route in the "main" route table that is used for locally generated traffic that is not a response to incoming traffic. (For example an MTA on your server trying to send an e-mail.)

The three interfaces all put a default route in the main route table, albeit with different metrics. Let's take a look a the main route table with ip route show:

default via 192.168.178.1 dev eth0  metric 100 
default via 192.168.178.1 dev wlan1  metric 101 
default via 192.168.178.1 dev wlan0  metric 102 
192.168.178.0/24 dev wlan0  proto kernel  scope link  src 192.168.178.97 
192.168.178.0/24 dev eth0  proto kernel  scope link  src 192.168.178.99 
192.168.178.0/24 dev wlan1  proto kernel  scope link  src 192.168.178.98

We can see that the main route table has three default routes, albeit with different metrics. The highest priority is eth0, then wlan1 and then wlan0 because lower metric numbers indicate a higher priority. Since eth0 has the lowest metric this is the default route that is going to be used for as long as eth0 is up. If eth0 goes down, outgoing traffic will switch to wlan1.

With this setup we can type ping 8.8.8.8 in one terminal and ifdown eth0 in another. ping should still work because because ifdown eth0 will remove the default route related to eth0, outgoing traffic will switch to wlan1.

The post-down lines make sure that the related route tables get deleted from the routing policy database (ip rule show) when the interface goes down, in order to keep everything tidy.

The problem that is left is that when you pull the plug from eth0 the default route for eth0 is still there and outgoing traffic fails. We need something to monitor our interfaces and to execute ifdown eth0 if there's a problem with the interface (i.e. NIC failure or someone pulling the plug).

Last step: enter ifplugd. That's a daemon that watches interfaces and executes ifup/ifdown if you pull the plug or if there's problem with the wifi connection /etc/default/ifplugd:

INTERFACES="eth0 wlan0 wlan1"
HOTPLUG_INTERFACES=""
ARGS="-q -f -u0 -d10 -w -I"
SUSPEND_ACTION="stop"

You can now pull the plug on eth0, outgoing traffic will switch to wlan1 and if you put the plug back in, outgoing traffic will switch back to eth0. Your server will stay online as long as any of the three interfaces work. For connecting to your server you can use the ip address of eth0 and if that fails, the ip address of wlan1 or wlan0.

Solution 2

Linux provides a better solution than your scripted workaround: active-backup bonding.

This way your machine will have only one ip address (and one mac address) and automatically and transparently switch interfaces if one interface becomes unavailable. No disruption of any TCP connection (neither to your internal lan nor to the internet).

I'm using this setup myself to automatically failover from eth0 to wlan0 on my debian laptop when I disconnect my laptop from the docking station.

My /etc/network/interfaces:

# The primary network interface
allow-hotplug eth0
iface eth0 inet manual
        bond-master bond0
        bond-primary eth0

# The secondary network interface
allow-hotplug wlan0
iface wlan0 inet manual
        pre-up sleep 5
        wpa-conf /etc/wpa_supplicant.conf
        bond-master bond0
        bond-primary eth0

# The bonding interface
allow-hotplug bond0
iface bond0 inet dhcp
        bond-slaves eth0 wlan0
        bond-primary eth0
        bond-mode active-backup
        bond-miimon 10
        bond_downdelay 10
        bond_updelay 4000

You can easily extend this setup to include multiple wlan devices. Setting the primary_reselect option to better (automatically select the fastest link) should help here.

For more information see https://wiki.linuxfoundation.org/networking/bonding and https://wiki.debian.org/Bonding

And (of course) the linux kernel documentation at https://www.kernel.org/doc/Documentation/networking/bonding.txt

Share:
69,113

Related videos on Youtube

rosix
Author by

rosix

Updated on September 18, 2022

Comments

  • rosix
    rosix over 1 year

    I would like to have multiple NICs (eth0 and wlan0) in the same subnet and to serve as a backup for the applications on the host if one of the NICs fail. For this reason I have created an additional routing table. This is how /etc/network/interfaces looks:

    iface eth0 inet static
    address 192.168.178.2
    netmask 255.255.255.0
    dns-nameserver 8.8.8.8 8.8.4.4
        post-up ip route add 192.168.178.0/24 dev eth0 src 192.168.178.2
        post-up ip route add default via 192.168.178.1 dev eth0
        post-up ip rule add from 192.168.178.2/32
        post-up ip rule add to 192.168.178.2/32
    
    iface wlan0 inet static
    wpa-conf /etc/wpa_supplicant.conf
    wireless-essid xyz
    address 192.168.178.3
    netmask 255.255.255.0
    dns-nameserver 8.8.8.8 8.8.4.4
        post-up ip route add 192.168.178.0/24 dev wlan0 src 192.168.178.3 table rt2
        post-up ip route add default via 192.168.178.1 dev wlan0 table rt2
        post-up ip rule add from 192.168.178.3/32 table rt2
        post-up ip rule add to 192.168.178.3/32 table rt2
    

    That works for connecting to the host: I can still SSH into it if one of the interfaces fails. However, the applications on the host cannot initialize a connection to the outside world if eth0 is down. That is my problem.

    I have researched that topic and found the following interesting information:

    When a program initiates an outbound connection it is normal for it to use the wildcard source address (0.0.0.0), indicating no preference as to which interface is used provided that the relevant destination address is reachable. This is not replaced by a specific source address until after the routing decision has been made. Traffic associated with such connections will not therefore match either of the above policy rules, and will not be directed to either of the newly-added routing tables. Assuming an otherwise normal configuration, it will instead fall through to the main routing table. http://www.microhowto.info/howto/ensure_symmetric_routing_on_a_server_with_multiple_default_gateways.html

    What I want is for the main route table to have more than one default gateway (one on eth0 and one on wlan0) and to go to the default gateway via eth0 by default and via wlan0 if eth0 is down.

    Is that possible? What do I need to do to achieve such a functionality?

    • dirkt
      dirkt about 7 years
      Very briefly: Several default routes will pick one interface at random, which leads to trouble because the assigned IP is different. What you want is multihoming or bundling, which is difficult to do, see e.g. here
    • Ingo
      Ingo almost 6 years
      You can use dynamic failover with bonding. There is no need to fiddle with default routes.
  • dirkt
    dirkt about 7 years
    Try making a connection that takes longer (e.g. scp a large file), watch what network interface it uses, disable that interface and see what happens.
  • rosix
    rosix about 7 years
    The scp session will break because the IP address changes. You could try using withsctp to keep the connection alive in such a case or use rsync instead of scp to contine the transfer from the point where it stopped.
  • dirkt
    dirkt about 7 years
    The point being: If it breaks, what's the advantage of your complicated setup over having just a single default route, say on the fastest network interface currently up? withsctp will also work for just one default route.
  • rosix
    rosix about 7 years
    "what's the advantage of your complicated setup over having just a single default route, say on the fastest network interface currently up?" >>That's exactly what my setup is doing. Only the fastest default route (eth0) is used by default. You're welcome.
  • AI0867
    AI0867 about 3 years
    Correct me if I'm wrong, but does this only work if (as is the case in the question) both interfaces are for the same actual network? The scripted workaround is more general and works with any set of networks, such as for example, 2 independent ISPs + 1 GSM backup + 1 satellite backup.