IP failover with 2 nodes on different subnet: cannot ping virtual IP from second node?

6,627

I think problem is not in cluster configuration, but in your routing architecture.

VIPArip resource agent manages local quagga to send routing updates. But you also need to use this routing updates to change routes to point to active server. I'll try to explain how it works.

RIP HA

Look at the picture. HA1 and HA2 is linux-ha cluster members with quagga running. Blue router listens to RIP from both network links.

When vip goes up on HA1, quagga sends RIP update to the blue router. It adds vip prefix to it's routing table with 192.168.1.2 nexthop.

When failover occurs, vip goes down on HA1 and quagga stops completely, so updates won't be sent. Blue router will remove routing table record after timeout, even if VIP won't go up on HA2. And when VIP goes up on HA2 it starts quagga and it will send RIP updates. Blue router will add record to the routing table with 192.168.2.2 nexthop.

It is possible to use viparip in more complex network topology, just make sure your border routers get routing updates throughout your network.

Share:
6,627

Related videos on Youtube

Greg Petersen
Author by

Greg Petersen

Updated on September 18, 2022

Comments

  • Greg Petersen
    Greg Petersen almost 2 years

    I'm going to setup redundant failover Redmine:

    • another instance was installed on the second server without problem
    • MySQL (running on the same machine with Redmine) was configured as master-master replication

    Because they are in different subnet (192.168.3.x and 192.168.6.x), it seems that VIPArip is the only choice.

    /etc/ha.d/ha.cf on node1

    logfacility none
    debug 1
    debugfile /var/log/ha-debug
    logfile /var/log/ha-log
    autojoin none
    warntime 3
    deadtime 6
    initdead 60
    udpport 694
    ucast eth1 node2.ip
    keepalive 1
    node node1
    node node2
    crm respawn
    

    /etc/ha.d/ha.cf on node2:

    logfacility none
    debug 1
    debugfile /var/log/ha-debug
    logfile /var/log/ha-log
    autojoin none
    warntime 3
    deadtime 6
    initdead 60
    udpport 694
    ucast eth0 node1.ip
    keepalive 1
    node node1
    node node2
    crm respawn
    

    crm configure show:

    node $id="6c27077e-d718-4c82-b307-7dccaa027a72" node1
    node $id="740d0726-e91d-40ed-9dc0-2368214a1f56" node2
    primitive VIPArip ocf:heartbeat:VIPArip \
            params ip="192.168.6.8" nic="lo:0" \
            op start interval="0" timeout="20s" \
            op monitor interval="5s" timeout="20s" depth="0" \
            op stop interval="0" timeout="20s" \
            meta is-managed="true"
    property $id="cib-bootstrap-options" \
            stonith-enabled="false" \
            dc-version="1.0.12-unknown" \
            cluster-infrastructure="Heartbeat" \
            last-lrm-refresh="1338870303"
    

    crm_mon -1:

    ============
    Last updated: Tue Jun  5 18:36:42 2012
    Stack: Heartbeat
    Current DC: node2 (740d0726-e91d-40ed-9dc0-2368214a1f56) - partition with quorum
    Version: 1.0.12-unknown
    2 Nodes configured, unknown expected votes
    1 Resources configured.
    ============
    
    Online: [ node1 node2 ]
    
     VIPArip    (ocf::heartbeat:VIPArip):   Started node1
    

    ip addr show lo:

    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
        inet 192.168.6.8/32 scope global lo
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    

    I can ping 192.168.6.8 from node1 (192.168.3.x):

    # ping -c 4 192.168.6.8
    PING 192.168.6.8 (192.168.6.8) 56(84) bytes of data.
    64 bytes from 192.168.6.8: icmp_seq=1 ttl=64 time=0.062 ms
    64 bytes from 192.168.6.8: icmp_seq=2 ttl=64 time=0.046 ms
    64 bytes from 192.168.6.8: icmp_seq=3 ttl=64 time=0.059 ms
    64 bytes from 192.168.6.8: icmp_seq=4 ttl=64 time=0.071 ms
    
    --- 192.168.6.8 ping statistics ---
    4 packets transmitted, 4 received, 0% packet loss, time 3000ms
    rtt min/avg/max/mdev = 0.046/0.059/0.071/0.011 ms
    

    but cannot ping virtual IP from node2 (192.168.6.x) and outside. Did I miss something?

    PS: you probably want to set IP2UTIL=/sbin/ip in the /usr/lib/ocf/resource.d/heartbeat/VIPArip resource agent script if you get something like this:

    Jun 5 11:08:10 node1 lrmd: [19832]: info: RA output: (VIPArip:stop:stderr) 2012/06/05_11:08:10 ERROR: Invalid OCF_RESK EY_ip [192.168.6.8]

    http://www.clusterlabs.org/wiki/Debugging_Resource_Failures


    Reply to @DukeLion:

    Which router receives RIP updates?

    When I start the VIPArip resource, ripd was run with below configuration file (on node1):

    /var/run/resource-agents/VIPArip-ripd.conf:

    hostname ripd
    password zebra
    debug rip events
    debug rip packet
    debug rip zebra
    log file /var/log/quagga/quagga.log
    router rip
    !nic_tag
     no passive-interface lo:0
     network lo:0
     distribute-list private out lo:0
     distribute-list private in lo:0
    !metric_tag
     redistribute connected metric 3
    !ip_tag
    access-list private permit 192.168.6.8/32
    access-list private deny any
    

    show ip route:

    Codes: K - kernel route, C - connected, S - static, R - RIP,
           O - OSPF, I - IS-IS, B - BGP, A - Babel,
           > - selected route, * - FIB route
    
    K>* 0.0.0.0/0 via 192.168.3.1, eth1
    C>* 127.0.0.0/8 is directly connected, lo
    K>* 169.254.0.0/16 is directly connected, eth1
    C>* 192.168.3.0/24 is directly connected, eth1
    C>* 192.168.6.8/32 is directly connected, lo
    

    sh ip rip status:

    Routing Protocol is "rip"
      Sending updates every 30 seconds with +/-50%, next due in 7 seconds
      Timeout after 180 seconds, garbage collect after 120 seconds
      Outgoing update filter list for all interface is not set
        lo:0 filtered by private
      Incoming update filter list for all interface is not set
        lo:0 filtered by private
      Default redistribution metric is 1
      Redistributing: connected
      Default version control: send version 2, receive any version 
        Interface        Send  Recv   Key-chain
      Routing for Networks:
        lo:0
      Routing Information Sources:
        Gateway          BadPackets BadRoutes  Distance Last Update
      Distance: (default is 120)
    
    • sartis
      sartis about 12 years
      Which router recieves RIP updates? It looks like the problems is in routing, not cluster configuration
    • ravi yarlagadda
      ravi yarlagadda about 12 years
      @quanta Is your router handling the inter-vlan traffic listening for those RIP updates and successfully adding them to its routing table?
    • Greg Petersen
      Greg Petersen about 12 years
      @ShaneMadden: Could you please elaborate more details on your question's first part? As you can see from above, the routing table doesn't include RIP.
    • sartis
      sartis about 12 years
      Well this is configuration of RIP process sending updates, but you need a router that is receiving it and routing traffic destined to VIP to active server.