Pinging Virtual IP for Linux HA cluster from a different subnet does not work

5,194

You're making the mistake of assuming your cluster config has anything to do with the issue you're seeing just because it is a new area for you. All the cluster software is doing is managing (and monitoring) resources, in this case an IP address that it'll configure on a host in the cluster. You could just as easily remove the whole cluster config and bring the IP addr up on one of the nodes and you'll see exactly the same problem.

Clearly if you can reach the IP from the same network but not from another there is a routing problem. Check your router config.

BTW, disabling stonith in a cluster is a one way ticket to data loss or corruption. I hope you've only disabled it during testing.

Share:
5,194

Related videos on Youtube

user52498
Author by

user52498

Updated on September 18, 2022

Comments

  • user52498
    user52498 almost 2 years

    I have setup a Linux cluster with Corosync/Pacemaker, and the two cluster nodes are within the same subnet sharing a virtual IP. For machines within the same subnet, they can ping the virtual IP "135.121.192.104" successfully.

    However, if I tried to ping the virtual IP "135.121.192.104" from the machine from a different subnet, then it does not respond to my ping. The other machines resides on the subnet "135.121.196.x".

    On my machines, I have the following subnet mask in my ifcfg-eth0 file:

    NETMASK=255.255.254.0

    and below is my output for the crm configure show:

    [root@h-008 crm]# crm configure show
    node h-008 \
            attributes standby="off"
    node h-009 \
            attributes standby="off"
    primitive GAXClusterIP ocf:heartbeat:IPaddr2 \
            params ip="135.121.192.104" cidr_netmask="23" \
            op monitor interval="30s" clusterip_hash="sourceip"
    clone GAXClusterIP2 GAXClusterIP \
            meta globally-unique="true" clone-node-max="2"
    property $id="cib-bootstrap-options" \
            dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
            cluster-infrastructure="openais" \
            expected-quorum-votes="2" \
            no-quorum-policy="ignore" \
            stonith-enabled="false"
    rsc_defaults $id="rsc-options" \
            resource-stickiness="100"
    

    and the output of the crm_mon status:

    [root@h-009 crm]# crm_mon status --one-shot
    non-option ARGV-elements: status
    ============
    Last updated: Thu Jun 23 08:12:21 2011
    Stack: openais
    Current DC: h-008 - partition with quorum
    Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
    2 Nodes configured, 2 expected votes
    1 Resources configured.
    ============
    
    Online: [ h-008 h-009 ]
    
     Clone Set: GAXClusterIP2 (unique)
         GAXClusterIP:0     (ocf::heartbeat:IPaddr2):       Started h-008
         GAXClusterIP:1     (ocf::heartbeat:IPaddr2):       Started h-009
    

    I am new to the Linux HA cluster setup, and unable to find out the root cause for the issue. Is there any configuration I can check to diagnose this problem?

    Additional comments:

    Below is the output of "route -n"
    
    [root@h-008 crm]# route -n
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    135.121.192.0   0.0.0.0         255.255.254.0   U     0      0        0 eth0
    169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
    0.0.0.0         135.121.192.1   0.0.0.0         UG    0      0        0 eth0
    

    and below is the traceroute output from the cluster machine to the machine outside the cluster:

    [root@h-008 crm]# traceroute 135.121.196.122
    traceroute to 135.121.196.122 (135.121.196.122), 30 hops max, 40 byte packets
     1  135.121.192.1 (135.121.192.1)  6.750 ms  6.967 ms  7.634 ms
     2  135.121.205.225 (135.121.205.225)  12.296 ms  14.385 ms  16.101 ms
     3  s2h-003.hpe.test.com (135.121.196.122)  0.172 ms  0.170 ms  0.170 ms
    

    and the below is the traceroute output from the machine outside the cluster, to the virtual IP 135.121.192.104:

    [root@s2h-003 ~]# traceroute 135.121.192.104
    traceroute to 135.121.192.104 (135.121.192.104), 30 hops max, 40 byte packets
     1  135.121.196.1 (135.121.196.1)  10.558 ms  10.895 ms  11.556 ms
     2  135.121.205.226 (135.121.205.226)  11.016 ms  12.797 ms  14.152 ms
     3  * * *
     4  * * *
     5  * * *
     6  * * *
     7  * * *
     8  *
    

    but when I tried to do a traceroute to the cluster's real IP address for one of the nodes, the traceroute is successful, i.e.:

    [root@s2h-003 ~]# traceroute 135.121.192.102
    traceroute to 135.121.192.102 (135.121.192.102), 30 hops max, 40 byte packets
     1  135.121.196.1 (135.121.196.1)  4.994 ms  5.315 ms  5.951 ms
     2  135.121.205.226 (135.121.205.226)  3.816 ms  6.016 ms  7.158 ms
     3  h-009.msite.pr.hpe.test.com (135.121.192.102)  0.236 ms  0.229 ms  0.216 ms
    
    • wolfgangsz
      wolfgangsz about 13 years
      Can you post the output of a traceroute?
    • Rilindo
      Rilindo about 13 years
      Can you post your network topology?