Difference between revisions of "Troubleshooting"

From LVSKB
Jump to: navigation, search
m
(UDP service fail over)
 
(16 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
== Introduction ==
 
== Introduction ==
  
In order to make troubleshooting, we need to understand the IP load balancing technology that we are using now in the cluster, and know the whole packet flow, such as how packets are received and sent out at the load balancer, and how packets are handled at real servers and sent out back to the clients.
+
In order to make troubleshooting easier, we need to understand the IP load balancing technology that we are using now in the cluster, and know the whole packet flow, such as how packets are received and sent out at the [[load balancer]], and how packets are handled at real servers and sent out back to the clients.
  
 
To know more information about IP load balacing technologies implemented in the IPVS, check
 
To know more information about IP load balacing technologies implemented in the IPVS, check
Line 8: Line 8:
 
* [[LVS/DR | Virtual Server via Direct Routing]]
 
* [[LVS/DR | Virtual Server via Direct Routing]]
  
Then, we can use some packet capture tools, such as [[ethereal]] and [[tcpdump]], to do troubleshooting at our cluster system. First, we can capture the load balanced traffic at both load balancer and real servers, in order to make sure that basic load balancing system works; Second, we can also capture packets of service monitoring among load balancers and real servers, to make sure cluster monitoring and high availability works.
+
Then, we can use some packet capture tools, such as [[ethereal]] and [[tcpdump]], to do troubleshooting at our cluster system. First, we can capture the load balanced traffic at both [[load balancer]] and [[real server]]s, in order to make sure that basic load balancing system works; Second, we can also capture packets of service monitoring among [[load balancer]]s and [[real server]]s, to verify that cluster monitoring and high availability work as expected.
  
 
== Load Balancer ==
 
== Load Balancer ==
  
 +
Capture all the packets related to virtual service, and check whether request packets are received at the [[load balancer]], and forwarded to real servers correctly.
  
 
== Real Server ==
 
== Real Server ==
 +
 +
Capture all the packets related to virtual service at real servers, and check whether request packets are received and response packets are sent out correctly.
 +
 +
In the [[LVS/DR]] and [[LVS/TUN]] clusters, we may need to pay attention to [[ARP Issues in LVS/DR and LVS/TUN Clusters|ARP issue]]. We can also capture ARP packets at [[real server]] and make sure that ARP works correctly.
 +
 +
== Troubleshooting Examples ==
 +
 +
=== UDP service fail over ===
 +
 +
: Hello,  I have a two directors in high availability configuration. All it's OK for TCP, but for UDP it's no OK. In UDP load balance it's ok, but the fail over don't happens.  If one client is "redirected" to one real server, the connections are redirected always to this server, even the server goes down.
 +
 +
First, check if the monitoring program works for UDP service, and make sure that it can remove the server from the scheduling list when its UDP service is down. Second, if the client uses the fixed port to access UDP service, do not use the quiescent option to setup server weight zero when server goes down. It's because that all the packets from this client will be continuously sent to a dead server when its server weight is zero.
 +
 +
=== lvs-kiss ===
 +
 +
While LVS provides basic means for load-balancing, [[lvs-kiss]] provides some more sophisticated possibilities.
 +
 +
Basically [[lvs-kiss]] is just a piece of Perl that sits on top of LVS.
 +
 +
[[lvs-kiss]] features configurable means of measuring node "load". This can be anything - from using the average load to an snmp-get (any thing you can imagine doing at the command-line of an lvs-server which gives a numerical response).
 +
 +
Trouble-shooting load-balancing can be tricky, when you encounter web-caches etc. I tend to use the good old "telnet VIP PORT" to check the result of load-balancing. For tests, you may even use dummy results with [[lvs-kiss]] (just do "echo NUMBER" as method for getting the load...).
 +
 +
== Traps ==
 +
 +
If you compile linux without the CONFIG_IP_VS_PROTO_TCP configure option, ipvsadm does not give any error messages, but does not grab incoming connections.
 +
 +
== External Links ==
 +
 +
* [http://www.ssi.bg/~ja/L4-NAT-HOWTO.txt LVS-NAT troubleshooting HOWTO from Julian]
 +
* [http://www.ssi.bg/~ja/TUN-HOWTO.txt LVS-TUN troubleshooting HOWTO from Julian]
 +
 +
[[Category:LVS Handbook]]

Latest revision as of 14:54, 11 March 2007

Introduction

In order to make troubleshooting easier, we need to understand the IP load balancing technology that we are using now in the cluster, and know the whole packet flow, such as how packets are received and sent out at the load balancer, and how packets are handled at real servers and sent out back to the clients.

To know more information about IP load balacing technologies implemented in the IPVS, check

Then, we can use some packet capture tools, such as ethereal and tcpdump, to do troubleshooting at our cluster system. First, we can capture the load balanced traffic at both load balancer and real servers, in order to make sure that basic load balancing system works; Second, we can also capture packets of service monitoring among load balancers and real servers, to verify that cluster monitoring and high availability work as expected.

Load Balancer

Capture all the packets related to virtual service, and check whether request packets are received at the load balancer, and forwarded to real servers correctly.

Real Server

Capture all the packets related to virtual service at real servers, and check whether request packets are received and response packets are sent out correctly.

In the LVS/DR and LVS/TUN clusters, we may need to pay attention to ARP issue. We can also capture ARP packets at real server and make sure that ARP works correctly.

Troubleshooting Examples

UDP service fail over

Hello, I have a two directors in high availability configuration. All it's OK for TCP, but for UDP it's no OK. In UDP load balance it's ok, but the fail over don't happens. If one client is "redirected" to one real server, the connections are redirected always to this server, even the server goes down.

First, check if the monitoring program works for UDP service, and make sure that it can remove the server from the scheduling list when its UDP service is down. Second, if the client uses the fixed port to access UDP service, do not use the quiescent option to setup server weight zero when server goes down. It's because that all the packets from this client will be continuously sent to a dead server when its server weight is zero.

lvs-kiss

While LVS provides basic means for load-balancing, lvs-kiss provides some more sophisticated possibilities.

Basically lvs-kiss is just a piece of Perl that sits on top of LVS.

lvs-kiss features configurable means of measuring node "load". This can be anything - from using the average load to an snmp-get (any thing you can imagine doing at the command-line of an lvs-server which gives a numerical response).

Trouble-shooting load-balancing can be tricky, when you encounter web-caches etc. I tend to use the good old "telnet VIP PORT" to check the result of load-balancing. For tests, you may even use dummy results with lvs-kiss (just do "echo NUMBER" as method for getting the load...).

Traps

If you compile linux without the CONFIG_IP_VS_PROTO_TCP configure option, ipvsadm does not give any error messages, but does not grab incoming connections.

External Links