In order to make troubleshooting easier, we need to understand the IP load balancing technology that we are using now in the cluster, and know the whole packet flow, such as how packets are received and sent out at the load balancer, and how packets are handled at real servers and sent out back to the clients.
To know more information about IP load balacing technologies implemented in the IPVS, check
- Virtual Server via Network Address Transaltion
- Virtual Server via IP Tunneling
- Virtual Server via Direct Routing
Then, we can use some packet capture tools, such as ethereal and tcpdump, to do troubleshooting at our cluster system. First, we can capture the load balanced traffic at both load balancer and real servers, in order to make sure that basic load balancing system works; Second, we can also capture packets of service monitoring among load balancers and real servers, to verify that cluster monitoring and high availability work as expected.
Capture all the packets related to virtual service, and check whether request packets are received at the load balancer, and forwarded to real servers correctly.
Capture all the packets related to virtual service at real servers, and check whether request packets are received and response packets are sent out correctly.
UDP service fail over
- Hello, I have a two directors in high availability configuration. All it's OK for TCP, but for UDP it's no OK. In UDP load balance it's ok, but the fail over don't happens. If one client is "redirected" to one real server, the connections are redirected always to this server, even the server goes down.
First, check if the monitoring program works for UDP service, and make sure that it can remove the server from the scheduling list when its UDP service is down. Second, if the client uses the fixed port to access UDP service, do not use the quiescent option to setup server weight zero when server goes down. It's because that all the packets from this client will be continuously sent to a dead server when its server weight is zero.
While LVS provides basic means for load-balancing, lvs-kiss provides some more sophisticated possibilities.
Basically lvs-kiss is just a piece of Perl that sits on top of LVS.
lvs-kiss features configurable means of measuring node "load". This can be anything - from using the average load to an snmp-get (any thing you can imagine doing at the command-line of an lvs-server which gives a numerical response).
Trouble-shooting load-balancing can be tricky, when you encounter web-caches etc. I tend to use the good old "telnet VIP PORT" to check the result of load-balancing. For tests, you may even use dummy results with lvs-kiss (just do "echo NUMBER" as method for getting the load...).
If you compile linux without the CONFIG_IP_VS_PROTO_TCP configure option, ipvsadm does not give any error messages, but does not grab incoming connections.