Keepalived and TCP_CHECK problem

Today I was debugging a problem I had with keepalived not discovering that a real server behind a virtual IP it manages, had died.

The problem was really strange because the check was very, very simple

real_server 192.168.1.65 3306
{
TCP_CHECK
{
connect_port 3306
bindto 192.168.1.65
connect_timeout 2
}
}

This configuration was created after reading keepalived.conf man pages, that talk about these 3 options for the TCP_CHECK, without going in deeper details. So I assumed that bindto IPADDR has to be used to indicate to which IP address we should connect to do the check. But I was wrong, because with this configuration if the real server behind dies, keepalived doesn’t notice anything at all. This is because the “bindto” option, I guess, is used to choose to which local (to the LVS director) IP address bind to check the external IP:port.
So, changing the configuration to looks like this:


real_server 192.168.1.65 3306
{
TCP_CHECK
{
connect_port 3306
connect_timeout 2
}
}

fixed the problem. Keepalived is a great product and works quite well, but it’s documentation is a bit disappointing.

Linux 2.6.x as real server in a LVS system

DISCLAIMER: this is not an howto, it’s just a reminder for myself and a tip for someone who already knows LVS (Linux Virtual Server) basics.

So, if you need to use a Linux as a real server behind a LVS and you’re using kernel 2.6.x, you will know that if you try a

ifconfig lo:0 192.168.1.131 -arp netmask 255.255.255.255 up

then arping from an external host will be answered ANYWAY by your host, and this is a VERY BAD THING in an LVS environment (cause the client will contact directly only one real server and will not pass always through the virtual server). This could seem a bug cause we are using the -arp switch in ifconfig which should tell the kernel to ignore the ARP replies for this IP.
To solve this problem, you have to change these kernel settings with sysctl:

net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2

in sysctl.conf (you can use sysctl net.ipv4.conf.eth0.arp_ignore=1 for example if you want to give it a try before).

With these parameters set, if you try to arping the lo:0 IP address from an external host, it won’t answer, but nonetheless your real server will accept the packets sent by the director of the LVS system.

I’ve tried this in Debian.