How to set up tcp check with keepalived?

5,353

If you do not need load balancing, track scripts offer failover based on checks run against your service.

First, add a vrrp_script block before your vrrp_instance:

global_defs {
    enable_script_security
}

vrrp_script chk_sshd {
    script "/usr/bin/pgrep sshd" # or "nc -zv localhost 22"
    interval 5                   # default: 1s
}

Next, add a track_script to your vrrp_instance referencing the vrrp_script:

 vrrp_instance VI_1 {
    ... other stuff ...

    track_script {
        chk_sshd
    }
}

While not strictly required, the enable_script_security and FQDN of the executable provide some assurances against malicious activity and will squelch warnings in logs. See the Keepalived man page for more info.

Share:
5,353

Related videos on Youtube

cat pants
Author by

cat pants

Updated on September 18, 2022

Comments

  • cat pants
    cat pants almost 2 years

    Trying to set up HA bastion servers. Failover, load balancing is not needed. Two servers running debian. bastion01 and bastion02. 192.168.0.10 and 192.168.0.11. Floating IP is 192.168.0.12.

    I started out with these configs:

    bastion01:

    global_defs {
       notification_email {
        [email protected]
       }   
       notification_email_from [email protected]
       smtp_server localhost
       smtp_connect_timeout 30
    }
    
    vrrp_instance VI_1 {
        state MASTER
        interface eth0
        virtual_router_id 101 
        priority 101 
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
        }   
        virtual_ipaddress {
            192.168.0.12
        }   
    }
    

    bastion02:

    global_defs {
       notification_email {
         [email protected] 
       }   
       notification_email_from [email protected]
       smtp_server localhost
       smtp_connect_timeout 30
    }
    
    vrrp_instance VI_1 {
        state MASTER
        interface eth0
        virtual_router_id 101 
        priority 100 
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
        }   
        virtual_ipaddress {
            192.168.0.12
        }   
    }
    

    This works absolutely great. Confirmed that the floating IP will fail over when either server is shutdown.

    However, it doesn't handle the case when ssh is stopped, but the server itself is still running.

    For that, I'll need to add a TCP check.

    It appears that keepalived's docs provide an example:

    http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html

    However, their example involves loadbalancing, which just adds another layer of complexity I am not interested in.

    It looks like the block in question is:

    TCP_CHECK { connect_timeout 3 connect_port 22 }

    I tried to use my best guess as to how to configure this:

    bastion01:

    global_defs {
       notification_email {
         [email protected] 
       }   
       notification_email_from [email protected]
       smtp_server localhost
       smtp_connect_timeout 30
    }
    
    vrrp_instance VI_1 {
        state MASTER
        interface eth0
        virtual_router_id 101 
        priority 101 
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
        }   
        virtual_ipaddress {
            192.168.0.12
        }   
    }
    
    real_server 192.168.0.10 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }   
    } 
    
    real_server 192.168.0.11 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }
    

    bastion02:

    global_defs {
       notification_email {
         [email protected] 
       }   
       notification_email_from [email protected]
       smtp_server localhost
       smtp_connect_timeout 30
    }
    
    vrrp_instance VI_1 {
        state MASTER
        interface eth0
        virtual_router_id 101 
        priority 100 
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
        }   
        virtual_ipaddress {
            192.168.0.12
        }   
    }
    
    real_server 192.168.0.10 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }   
    } 
    
    real_server 192.168.0.11 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }
    

    But this didn't work, it didn't understand the real_server blocks. Ok fine, maybe I can't get away with failover only, maybe the tcp check is part of the lb component of keepalived, so I must use load balancing here. This is fine, couldn't hurt. So...configs now become (taken directly from http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html ):

    bastion01:

    global_defs {
       notification_email {
        [email protected]
       }   
       notification_email_from [email protected]
       smtp_server localhost
       smtp_connect_timeout 30
    }
    
    vrrp_instance VI_1 {
        state MASTER
        interface eth0
        virtual_router_id 101 
        priority 101 
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
        }   
        virtual_ipaddress {
            192.168.0.12
        }   
    
    }
    
    virtual_server 192.168.1.11 22 {
        delay_loop 6
        lb_algo rr
        lb_kind NAT 
        nat_mask 255.255.255.0
    
        protocol TCP 
    
        real_server 192.168.0.10 22 {
            weight 1
            TCP_CHECK {
                connect_timeout 3
                connect_port 22
            }
        }   
    
        real_server 192.168.0.11 22 {
            weight 1
            TCP_CHECK {
                connect_timeout 3
                connect_port 22
            }
        }   
    } 
    

    bastion02:

    global_defs {
       notification_email {
        [email protected]
       }   
       notification_email_from [email protected]
       smtp_server localhost
       smtp_connect_timeout 30
    }
    
    vrrp_instance VI_1 {
        state MASTER
        interface eth0
        virtual_router_id 101 
        priority 100 
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass 1111
        }   
        virtual_ipaddress {
            192.168.0.12
        }   
    
    }
    
    virtual_server 192.168.1.11 22 {
        delay_loop 6
        lb_algo rr
        lb_kind NAT 
        nat_mask 255.255.255.0
    
        protocol TCP 
    
        real_server 192.168.0.10 22 {
            weight 1
            TCP_CHECK {
                connect_timeout 3
                connect_port 22
            }
        }   
    
        real_server 192.168.0.11 22 {
            weight 1
            TCP_CHECK {
                connect_timeout 3
                connect_port 22
            }
        }   
    } 
    

    This just straight up does not work.

    When I stop ssh on bastion01 and try to ssh to the floating ip, I get connection refused, the ip doesn't fail over to bastion02.

    In the logs on bastion01:

    bastion01 Keepalived_healthcheckers[11613]: Check on service [192.168.0.10]:22 failed after 1 retry.
    bastion01 Keepalived_healthcheckers[11613]: Removing service [192.168.0.10]:22 from VS [192.168.1.11]:22
    

    How do I convince keepalived to actually failover the floating ip when the TCP health check fails?