Talk:Locality-Based Least-Connection Scheduling

From LVSKB
Jump to: navigation, search

Expiration algorithm

I find it usefull for understanding why subsequent connections to the same destination IP address may not follow the same path.

ServerNode[] is a list of maps from destination IP address to server;
C(ServerNode[]) is the current number of not NULL maps in the list;
Now is the current system time;
net.ipv4.vs.lblc_expiration is the sysctl parameter (default 24 hours).

Every 1 minute:

Count = C(ServerNode[]);
if (Count > 16384) then {
  for (dest_ip = 0.0.0.0;
       C(ServerNode[]) > min((Count-16384)*4/3, 8192);
       dest_ip++) {
    if (ServerNode[dest_ip].server is not NULL AND
        ServerNode[dest_ip].lastuse < Now - 6 minutes) then
      ServerNode[dest_ip].server = NULL;
  }
}

Every 30 minutes:

for (dest_ip = 0.0.0.0; dest_ip < 255.255.255.255; dest_ip++) {
   if (ServerNode[dest_ip].server is not NULL AND
       ServerNode[dest_ip].lastuse < Now - net.ipv4.vs.lblc_expiration) then
     ServerNode[dest_ip].server = NULL;
}

Please correct me if I am wrong.

Jkrzyszt 02:31, 16 September 2006 (CST)

When I started to use the lblc algorithm I found that it provides an extra feature - some kind of connection persistance. However, the expiration algorithm, as it is, does not take into account if the destination map just to be removed is related to any active connections. This way, connection persistance is not guaranteed, even with very high weights set.

You may ask why not to use the persistance option. Well, the lblc algorithm is designed to be used together with fwmark in transparent cache clusters. If such a cluster is accessed through another proxy (that applies access control rules, for example), using classical persistance solution is not an option because all the traffic from this proxy would be directed to the same cache.

I am going to investigate the sources if it would be possible to check for no active (or maybe even inactive) conections using the specific map before it can be removed from the list.

I would appreciate any comments on this matter.

Janusz Jkrzyszt 22:20, 18 September 2006 (CST)

I found it much simpler to expose an additional sysctl variable and use it in place of the hardcoded ENTRY_TIMEOUT parameter.

--- linux-source-2.6.15/include/net/ip_vs.h     2006-01-03 04:21:10.000000000 +0100
+++ linux-source-2.6.15-1-e50-debug_7bpo1.200604270947/include/net/ip_vs.h     2006-09-19 15:29:01.000000000 +0200
@@ -359,6 +359,7 @@
        NET_IPV4_VS_SYNC_THRESHOLD=24,
        NET_IPV4_VS_NAT_ICMP_SEND=25,
        NET_IPV4_VS_EXPIRE_QUIESCENT_TEMPLATE=26,
+       NET_IPV4_VS_LBLC_TIMEOUT=27,
        NET_IPV4_VS_LAST
 };
--- linux-source-2.6.15/net/ipv4/ipvs/ip_vs_lblc.c      2006-01-03 04:21:10.000000000 +0100
+++ linux-source-2.6.15-1-e50-debug_7bpo1.200604270947/net/ipv4/ipvs/ip_vs_lblc.c       2006-09-19 16:37:29.000000000 +0200
@@ -57,6 +57,7 @@
  */
 #define CHECK_EXPIRE_INTERVAL   (60*HZ)
 #define ENTRY_TIMEOUT           (6*60*HZ)
+static int sysctl_ip_vs_lblc_timeout = ENTRY_TIMEOUT;
 
 /*
  *    It is for full expiration check.
@@ -118,6 +119,14 @@
                .mode           = 0644,
                .proc_handler   = &proc_dointvec_jiffies,
        },
+       {
+               .ctl_name       = NET_IPV4_VS_LBLC_TIMEOUT,
+               .procname       = "lblc_timeout",
+               .data           = &sysctl_ip_vs_lblc_timeout,
+               .maxlen         = sizeof(int),
+               .mode           = 0644,
+               .proc_handler   = &proc_dointvec_jiffies,
+       },
        { .ctl_name = 0 }
 };
 
@@ -367,7 +376,7 @@
 
                write_lock(&tbl->lock);
                list_for_each_entry_safe(en, nxt, &tbl->bucket[j], list) {
-                       if (time_before(now, en->lastuse + ENTRY_TIMEOUT))
+                       if (time_before(now, en->lastuse + sysctl_ip_vs_lblc_timeout))
                                continue;
 
                        ip_vs_lblc_free(en);

Now I can tune the every minute expiration procedure to suit my needs. Are there any chances for this modification to be included in the mainstream kernel?

Janusz Jkrzyszt 22:48, 19 September 2006 (CST)
Thanks a lot for the patch, I'll try to include it in the mainstream kernel. --Wensong 21:43, 23 September 2006 (CST)

Fine. For your convenience, the raw patch is available here.

Janusz Jkrzyszt 01:50, 24 September 2006 (CST)