Bridging LXC containers to host eth0 so they can have a public IP

29,032

A better way to make your change permanent is to use sysctl instead of writing to /proc directly since that is the standard way to configure kernel parameters at runtime so they are set correctly at next boot:

# cat >> /etc/sysctl.d/99-bridge-nf-dont-pass.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-filter-vlan-tagged = 0
EOF
# service procps start

As for the answer to the question in your update...

bridge-netfilter (or bridge-nf) is a very simple bridge for IPv4/IPv6/ARP packets (even in 802.1Q VLAN or PPPoE headers) that provides the functionality for a stateful transparent firewall, but more advanced functionality like transparent IP NAT is provided by passing those packets to arptables/iptables for further processing-- however even if the more advanced features of arptables/iptables is not need, passing packets to those programs is still turned on by default in the kernel module and must be turned off explicitly using sysctl.

What are they here for? These kernel configuration options are here to either pass (1) or don't pass (0) packets to arptables/iptables as described in the bridge-nf FAQ:

As of kernel version 2.6.1, there are three sysctl entries for bridge-nf behavioral control (they can be found under /proc/sys/net/bridge/):
bridge-nf-call-arptables - pass (1) or don't pass (0) bridged ARP traffic to arptables' FORWARD chain.
bridge-nf-call-iptables - pass (1) or don't pass (0) bridged IPv4 traffic to iptables' chains.
bridge-nf-call-ip6tables - pass (1) or don't pass (0) bridged IPv6 traffic to ip6tables' chains.
bridge-nf-filter-vlan-tagged - pass (1) or don't pass (0) bridged vlan-tagged ARP/IP traffic to arptables/iptables.

Is it safe to disable all bridge-nf-*? Yes, it is not only safe to do so, but there is a recommendation for distributions to turn it off by default to help people avoid confusion for the kind of problem you encountered:

In practice, this can lead to serious confusion where someone creates a bridge and finds that some traffic isn't being forwarded across the bridge. Because it's so unexpected that IP firewall rules apply to frames on a bridge, it can take quite some time to figure out what's going on.

and to increase security:

I still think the risk with bridging is higher, especially in the presence of virtualisation. Consider the scenario where you have two VMs on the one host, each with a dedicated bridge with the intention that neither should know anything about the other's traffic.

With conntrack running as part of bridging, the traffic can now cross over which is a serious security hole.

UPDATE: May 2015

If you are running a kernel older than 3.18, then you may be subject to the old behavior of bridge filtering enabled by default; if you newer than 3.18, then you can still be bitten by this if you've loaded the bridge module and haven't disabled the bridge filtering. See:

https://bugzilla.redhat.com/show_bug.cgi?id=634736#c44

After all these years of asking for the default of bridge filtering to be "disabled" and the change being refused by the kernel maintainers, now the filtering has been moved into a separate module that isn't loaded (by default) when the bridge module is loaded, effectively making the default "disabled". Yay!

I think this is in the kernel as of 3.17 (It definitely is in kernel 3.18.7-200.fc21, and appears to be in git prior to the tag "v3.17-rc4")

Share:
29,032

Related videos on Youtube

Vianney Stroebel
Author by

Vianney Stroebel

Updated on September 18, 2022

Comments

  • Vianney Stroebel
    Vianney Stroebel over 1 year

    UPDATE:

    I found the solution there: http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge#No_traffic_gets_trough_.28except_ARP_and_STP.29

     # cd /proc/sys/net/bridge
     # ls
     bridge-nf-call-arptables  bridge-nf-call-iptables
     bridge-nf-call-ip6tables  bridge-nf-filter-vlan-tagged
     # for f in bridge-nf-*; do echo 0 > $f; done
    

    But I'd like to have expert opinions on this: is it safe to disable all bridge-nf-*? What are they here for?

    END OF UPDATE

    I need to bridge LXC containers to the physical interface (eth0) of my host, reading numerous tutorials, documents and blog posts on the subject.

    I need the containers to have their own public IP (which I've previously done KVM/libvirt).

    After two days of searching and trying, I still can't make it work with LXC containers.

    The host runs a freshly installed Ubuntu Server Quantal (12.10) with only libvirt (which I'm not using here) and lxc installed.

    I created the containers with :

    lxc-create -t ubuntu -n mycontainer

    So they also run Ubuntu 12.10.

    Content of /var/lib/lxc/mycontainer/config is:

    
    lxc.utsname = mycontainer
    lxc.mount = /var/lib/lxc/test/fstab
    lxc.rootfs = /var/lib/lxc/test/rootfs
    
    
    lxc.network.type = veth
    lxc.network.flags = up
    lxc.network.link = br0
    lxc.network.name = eth0
    lxc.network.veth.pair = vethmycontainer
    lxc.network.ipv4 = 179.43.46.233
    lxc.network.hwaddr= 02:00:00:86:5b:11
    
    lxc.devttydir = lxc
    lxc.tty = 4
    lxc.pts = 1024
    lxc.arch = amd64
    lxc.cap.drop = sys_module mac_admin mac_override
    lxc.pivotdir = lxc_putold
    
    # uncomment the next line to run the container unconfined:
    #lxc.aa_profile = unconfined
    
    lxc.cgroup.devices.deny = a
    # Allow any mknod (but not using the node)
    lxc.cgroup.devices.allow = c *:* m
    lxc.cgroup.devices.allow = b *:* m
    # /dev/null and zero
    lxc.cgroup.devices.allow = c 1:3 rwm
    lxc.cgroup.devices.allow = c 1:5 rwm
    # consoles
    lxc.cgroup.devices.allow = c 5:1 rwm
    lxc.cgroup.devices.allow = c 5:0 rwm
    #lxc.cgroup.devices.allow = c 4:0 rwm
    #lxc.cgroup.devices.allow = c 4:1 rwm
    # /dev/{,u}random
    lxc.cgroup.devices.allow = c 1:9 rwm
    lxc.cgroup.devices.allow = c 1:8 rwm
    lxc.cgroup.devices.allow = c 136:* rwm
    lxc.cgroup.devices.allow = c 5:2 rwm
    # rtc
    lxc.cgroup.devices.allow = c 254:0 rwm
    #fuse
    lxc.cgroup.devices.allow = c 10:229 rwm
    #tun
    lxc.cgroup.devices.allow = c 10:200 rwm
    #full
    lxc.cgroup.devices.allow = c 1:7 rwm
    #hpet
    lxc.cgroup.devices.allow = c 10:228 rwm
    #kvm
    lxc.cgroup.devices.allow = c 10:232 rwm
    

    Then I changed my host /etc/network/interfaces to:

    
    auto lo
    iface lo inet loopback
    
    auto br0
    iface br0 inet static
            bridge_ports eth0
            bridge_fd 0
            address 92.281.86.226
            netmask 255.255.255.0
            network 92.281.86.0
            broadcast 92.281.86.255
            gateway 92.281.86.254
            dns-nameservers 213.186.33.99
            dns-search ovh.net
    

    When I try command line configuration ("brctl addif", "ifconfig eth0", etc.) my remote host becomes inaccessible and I have to hard reboot it.

    I changed the content of /var/lib/lxc/mycontainer/rootfs/etc/network/interfaces to:

    
    auto lo
    iface lo inet loopback
    
    auto eth0
    iface eth0 inet static
            address 179.43.46.233
            netmask 255.255.255.255
            broadcast 178.33.40.233
            gateway 92.281.86.254
    

    It takes several minutes for mycontainer to start (lxc-start -n mycontainer).

    I tried replacing

            gateway 92.281.86.254
    by :
            post-up route add 92.281.86.254 dev eth0
            post-up route add default gw 92.281.86.254
            post-down route del 92.281.86.254 dev eth0
            post-down route del default gw 92.281.86.254
    

    My container then starts instantly.

    But whatever configuration I set in /var/lib/lxc/mycontainer/rootfs/etc/network/interfaces, I cannot ping from mycontainer to any IP (including the host's) :

    
    ubuntu@mycontainer:~$ ping 92.281.86.226 
    PING 92.281.86.226 (92.281.86.226) 56(84) bytes of data.
    ^C
    --- 92.281.86.226 ping statistics ---
    6 packets transmitted, 0 received, 100% packet loss, time 5031ms
    

    And my host cannot ping the container:

    
    root@host:~# ping 179.43.46.233
    PING 179.43.46.233 (179.43.46.233) 56(84) bytes of data.
    ^C
    --- 179.43.46.233 ping statistics ---
    5 packets transmitted, 0 received, 100% packet loss, time 4000ms
    

    My container's ifconfig:

    
    ubuntu@mycontainer:~$ ifconfig
    eth0      Link encap:Ethernet  HWaddr 02:00:00:86:5b:11  
              inet addr:179.43.46.233  Bcast:255.255.255.255  Mask:0.0.0.0
              inet6 addr: fe80::ff:fe79:5a31/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:64 errors:0 dropped:6 overruns:0 frame:0
              TX packets:54 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:4070 (4.0 KB)  TX bytes:4168 (4.1 KB)
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:32 errors:0 dropped:0 overruns:0 frame:0
              TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:2496 (2.4 KB)  TX bytes:2496 (2.4 KB)
    

    My host's ifconfig:

    
    root@host:~# ifconfig
    br0       Link encap:Ethernet  HWaddr 4c:72:b9:43:65:2b  
              inet addr:92.281.86.226  Bcast:91.121.67.255  Mask:255.255.255.0
              inet6 addr: fe80::4e72:b9ff:fe43:652b/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:1453 errors:0 dropped:18 overruns:0 frame:0
              TX packets:1630 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:145125 (145.1 KB)  TX bytes:299943 (299.9 KB)
    
    eth0      Link encap:Ethernet  HWaddr 4c:72:b9:43:65:2b  
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:3178 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1637 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:298263 (298.2 KB)  TX bytes:309167 (309.1 KB)
              Interrupt:20 Memory:fe500000-fe520000 
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:6 errors:0 dropped:0 overruns:0 frame:0
              TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:300 (300.0 B)  TX bytes:300 (300.0 B)
    
    vethtest  Link encap:Ethernet  HWaddr fe:0d:7f:3e:70:88  
              inet6 addr: fe80::fc0d:7fff:fe3e:7088/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:54 errors:0 dropped:0 overruns:0 frame:0
              TX packets:67 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:4168 (4.1 KB)  TX bytes:4250 (4.2 KB)
    
    virbr0    Link encap:Ethernet  HWaddr de:49:c5:66:cf:84  
              inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
    
    

    I have disabled lxcbr0 (USE_LXC_BRIDGE="false" in /etc/default/lxc).

    
    root@host:~# brctl show
    bridge name     bridge id               STP enabled     interfaces                                                                                                 
    br0             8000.4c72b943652b       no              eth0                                                                                                       
                                                            vethtest        
    

    I have configured the IP 179.43.46.233 to point to 02:00:00:86:5b:11 in my hosting provider (OVH) config panel.
    (The IPs in this post are not the real ones.)

    Thanks for reading this long question! :-)

    Vianney