How to set up tcp check with keepalived?
If you do not need load balancing, track scripts offer failover based on checks run against your service.
First, add a vrrp_script
block before your vrrp_instance
:
global_defs {
enable_script_security
}
vrrp_script chk_sshd {
script "/usr/bin/pgrep sshd" # or "nc -zv localhost 22"
interval 5 # default: 1s
}
Next, add a track_script
to your vrrp_instance
referencing the vrrp_script
:
vrrp_instance VI_1 {
... other stuff ...
track_script {
chk_sshd
}
}
While not strictly required, the enable_script_security
and FQDN of the executable provide some assurances against malicious activity and will squelch warnings in logs. See the Keepalived man page for more info.
Related videos on Youtube
cat pants
Updated on September 18, 2022Comments
-
cat pants almost 2 years
Trying to set up HA bastion servers. Failover, load balancing is not needed. Two servers running debian. bastion01 and bastion02. 192.168.0.10 and 192.168.0.11. Floating IP is 192.168.0.12.
I started out with these configs:
bastion01:
global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server localhost smtp_connect_timeout 30 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 101 priority 101 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.0.12 } }
bastion02:
global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server localhost smtp_connect_timeout 30 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 101 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.0.12 } }
This works absolutely great. Confirmed that the floating IP will fail over when either server is shutdown.
However, it doesn't handle the case when ssh is stopped, but the server itself is still running.
For that, I'll need to add a TCP check.
It appears that keepalived's docs provide an example:
http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html
However, their example involves loadbalancing, which just adds another layer of complexity I am not interested in.
It looks like the block in question is:
TCP_CHECK { connect_timeout 3 connect_port 22 }
I tried to use my best guess as to how to configure this:
bastion01:
global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server localhost smtp_connect_timeout 30 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 101 priority 101 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.0.12 } } real_server 192.168.0.10 22 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 22 } } real_server 192.168.0.11 22 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 22 } }
bastion02:
global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server localhost smtp_connect_timeout 30 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 101 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.0.12 } } real_server 192.168.0.10 22 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 22 } } real_server 192.168.0.11 22 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 22 } }
But this didn't work, it didn't understand the real_server blocks. Ok fine, maybe I can't get away with failover only, maybe the tcp check is part of the lb component of keepalived, so I must use load balancing here. This is fine, couldn't hurt. So...configs now become (taken directly from http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html ):
bastion01:
global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server localhost smtp_connect_timeout 30 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 101 priority 101 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.0.12 } } virtual_server 192.168.1.11 22 { delay_loop 6 lb_algo rr lb_kind NAT nat_mask 255.255.255.0 protocol TCP real_server 192.168.0.10 22 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 22 } } real_server 192.168.0.11 22 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 22 } } }
bastion02:
global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server localhost smtp_connect_timeout 30 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 101 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.0.12 } } virtual_server 192.168.1.11 22 { delay_loop 6 lb_algo rr lb_kind NAT nat_mask 255.255.255.0 protocol TCP real_server 192.168.0.10 22 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 22 } } real_server 192.168.0.11 22 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 22 } } }
This just straight up does not work.
When I stop ssh on bastion01 and try to ssh to the floating ip, I get connection refused, the ip doesn't fail over to bastion02.
In the logs on bastion01:
bastion01 Keepalived_healthcheckers[11613]: Check on service [192.168.0.10]:22 failed after 1 retry. bastion01 Keepalived_healthcheckers[11613]: Removing service [192.168.0.10]:22 from VS [192.168.1.11]:22
How do I convince keepalived to actually failover the floating ip when the TCP health check fails?