Keepalived not sending mutlicast advertisments

6,533

Boy do I feel stupid. I had my keepalived.conf file saved as keepalived.cfg in /etc/keepalived/ (think I picked this up from haproxy.cfg). Keepalived looks for /etc/keepalived/keepalive.conf. I was starting keepalived without the -f flag so it was starting with no config.

If I had of used the -d options (dump conf to syslog) I would have seen it was using the default config and not picking up my settings.

Share:
6,533

Related videos on Youtube

The_Viper
Author by

The_Viper

Updated on September 18, 2022

Comments

  • The_Viper
    The_Viper almost 2 years

    I have two systems, both VMs. The are configured to use Bridged networking. I am trying to get keepalived to manage ownership of a VIP - 10.190.1.230. I have tried two versions of keepalived-1.2.2 and keepalived-1.2.1, built from source.

    ServerA - RHEL5.2 x64 - 10.190.1.228 - PRIORITY 50
    ServerB - RHEL6 x64 - 10.190.1.229 - PRIORITY 101
    VIP - 10.190.1.230
    

    My problem seems to be keepalived on ServerB is not sending multicast advertisements. It is seeing multicast adverts. from ServerA:

    [root@ServerB~]# tcpdump -vv -c 3 -i eth0 vrrp
    tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
    10:18:10.760577 IP (tos 0x0, ttl 255, id 856, offset 0, flags [none], proto VRRP (112), length 40)
    10.190.1.228 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 50, authtype none, intvl 1s, length 20, addrs: 10.190.1.230
    10:18:11.762039 IP (tos 0x0, ttl 255, id 857, offset 0, flags [none], proto VRRP (112), length 40)
    10.190.1.228 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 50, authtype none, intvl 1s, length 20, addrs: 10.190.1.230
    10:18:12.762883 IP (tos 0x0, ttl 255, id 858, offset 0, flags [none], proto VRRP (112), length 40)
    10.190.1.228 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 50, authtype none, intvl 1s, length 20, addrs: 10.190.1.230
    3 packets captured
    3 packets received by filter
    0 packets dropped by kernel
    [root@ServerB~]# 
    

    If I kill the keepalived on ServerA, and keep the tcpdump running, I see no packets. I am using the following simple keepalived configuration:

    Server A - 10.190.1.228

     vrrp_instance VI_1 {
        interface eth0
        state BACKUP 
        virtual_router_id 151
        priority 50 
        virtual_ipaddress {
                10.190.1.230
        }
    }
    

    Server B - 10.190.1.229

    vrrp_instance VI_1 {
        interface eth0
        state MASTER
        virtual_router_id 151
        priority 100 
        virtual_ipaddress {
            10.190.1.230
        }
    }
    

    ServerA, correctly I guess, seeing as it cannot see VRRPv2 adverts from the higher priority keepalived on ServerB, is holding the VIP:

    [root@ServerA~]# ip add sh eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 08:00:27:59:58:c0 brd ff:ff:ff:ff:ff:ff
    inet 10.190.1.228/24 brd 10.190.1.255 scope global eth0
    inet 10.190.1.230/32 scope global eth0
    inet6 fe80::a00:27ff:fe59:58c0/64 scope link 
       valid_lft forever preferred_lft forever
    [root@ServerA~]# 
    

    Network config

    Firewalls are disabled on both machines. Both interfaces have the MULTICAST flag set.

    I have used iperf to publish to the VRRP group:

    [root@ServerB~]# iperf -u -c 224.0.0.18
    ------------------------------------------------------------
    Client connecting to 224.0.0.18, UDP port 5001
    Sending 1470 byte datagrams
    Setting multicast TTL to 1
    UDP buffer size:  122 KByte (default)
    ------------------------------------------------------------
    [  3] local 10.190.1.229 port 32929 connected with 224.0.0.18 port 5001
    ^C[ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 0.6 sec  73.2 KBytes  1.05 Mbits/sec
    [  3] Sent 51 datagrams
    [root@ServerB~]# 
    

    ServerA can see this traffic:

    [root@ServerA~]# tcpdump -c 3 -i eth0 host 224.0.0.18
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
    10:37:30.460427 IP 10.190.1.229.33088 > vrrp.mcast.net.commplex-link: UDP, length 1470
    10:37:30.472247 IP 10.190.1.229.33088 > vrrp.mcast.net.commplex-link: UDP, length 1470
    10:37:30.482908 IP 10.190.1.229.33088 > vrrp.mcast.net.commplex-link: UDP, length 1470
    3 packets captured
    10 packets received by filter
    0 packets dropped by kernel
    [root@ServerA~]# 
    

    The above would make me think that this is not a network issue. I do not have mutlicast routes in the routing table, but the above suggests I don't need one. The multicast traffic is using eth0.

    Finally, here is log out from keepalived on ServerB:

    May 18 10:40:46 ServerB Keepalived: Starting Keepalived v1.2.1 (05/17,2011)
    May 18 10:40:46 ServerB Keepalived: Remove a zombie pid file /var/run/keepalived.pid
    May 18 10:40:46 ServerB Keepalived: Registering Kernel netlink reflector
    May 18 10:40:46 ServerB Keepalived: Registering Kernel netlink command channel
    May 18 10:40:46 ServerB Keepalived: Registering gratutious ARP shared channel
    May 18 10:40:46 ServerB Keepalived: Configuration is using : 55219 Bytes
    May 18 10:40:46 ServerB Keepalived: Using LinkWatch kernel netlink reflector...
    

    I haven't run it with the -D switch, this seems to be memory debugging and means very little to me. I've uploaded strace output to here.

    When I strace keepalived with the -n flag (don't fork) I get the following output, after the output linked above:

    sendto(3, "<30>May 18 10:58:50 Keepalived: "..., 68, MSG_NOSIGNAL, NULL, 0) = 68
    sendto(3, "<30>May 18 10:58:50 Keepalived: "..., 75, MSG_NOSIGNAL, NULL, 0) = 75
    rt_sigaction(SIGCHLD, {0x411b60, [], SA_RESTORER|SA_RESTART, 0x3db5a32a20}, {SIG_DFL, [], 0}, 8) = 0
    select(1024, [4 6], [], [], {1, 0})     = 0 (Timeout)
    select(1024, [4 6], [], [], {1, 0})     = 0 (Timeout)
    select(1024, [4 6], [], [], {1, 0})     = 0 (Timeout)
    select(1024, [4 6], [], [], {1, 0})     = 0 (Timeout)
    [ etc ..]
    

    This is in contrast to the strace output for the working keepalived on ServerA, in which I can see sendto(), sendmdg() and recmsg() calls being made.