Difference between revisions of "Building Scalable TFTP Cluster using LVS"

From LVSKB
Jump to: navigation, search
(lvs.cf Configuration)
(Monitoring)
 
(One intermediate revision by the same user not shown)
Line 62: Line 62:
 
You may notice I call a check_tftp script, I took the [http://mathias-kettner.de/download/check_tftp Nagios Plugin].
 
You may notice I call a check_tftp script, I took the [http://mathias-kettner.de/download/check_tftp Nagios Plugin].
  
'''A Hangup with Red Hat Piranha'''
+
'''A Hangup with Red Hat Piranha''' -- [https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=243908 Fixed in 0.8.6.2]
 +
 
 
Another thing that is Red Hat Piranha Specific, but somebody here might run into it. This check_tftp script used with nanny, does not detect that a tftp server is not available, when the tftp client tries to time out it fails because the nanny process is blocking SIGALRM the signal that it uses on timeout.  I had to compile nanny without it blocking SIGALRM as I could not find a way in a shell script to unblock that signal.  Also nanny does not like when check_tftp returns with error codes. so I had to modify the script to always exit 0 instead of the different statuses it returns for Nagios.
 
Another thing that is Red Hat Piranha Specific, but somebody here might run into it. This check_tftp script used with nanny, does not detect that a tftp server is not available, when the tftp client tries to time out it fails because the nanny process is blocking SIGALRM the signal that it uses on timeout.  I had to compile nanny without it blocking SIGALRM as I could not find a way in a shell script to unblock that signal.  Also nanny does not like when check_tftp returns with error codes. so I had to modify the script to always exit 0 instead of the different statuses it returns for Nagios.
  

Latest revision as of 20:50, 20 June 2007

Introduction

TFTP is a bit tricky because of how the protocol works. The client sends Ack's to the same port from which the data came from the server. This is problematic with a port based LVS setup because the client ends up sending packets to the VIP on a port that it is not expecting packets. The answer to this is to use firewall marks.

TFTP Protocol:

  1. Client:12345 -> LVS: 69 (Request File)
  2. LVS:23456 -> Client: 12345 (Data)
  3. Client:12345 -> LVS: 23456 (Ack)

And this is where the problem is, with a port based config, the LVS router doesn't know to listen on some random high port so the packet is dropped. The Real Server is listening on that port, but the client doesn't know anything about the real server. This is where firewall marks come in to play, set up iptables rules on the lvs servers that mark all of your TFTP packets.

Architecture

This configuration is on a Direct Routing layout. but it should not matter much.

Configuration Example

Adding Firewall Marks

In order to identify and group TFTP packets together use firewall marks. iptables can add the marks to your TFTP packets. Since TFTP uses any unprivileged port it's kind of like the carpet bombing of port selection, you will need to include them all. This will limit you to making TFTP the only UDP service that uses unprivileged ports for that particular Virtual Service IP.

iptables -t mangle -A PREROUTING -i eth0 -p udp -s 0.0.0.0/0 -d <VIRTUAL IP> --dport 69 -j MARK --set-mark 1
iptables -t mangle -A PREROUTING -i eth0 -p udp -s 0.0.0.0/0 -d <VIRTUAL IP> --dport 1024:65535 -j MARK --set-mark 1

Put those in an init script, or rc.local that runs on both of the LVS Directors.

lvs.cf Configuration

I probably should not assume that everyone is using piranha, but that is the config I have so that is what I'll document.

The lvs.cf the config changes are simple, instead of the port option you use the fwmark option. In my example I left the port option for fun, but I think it's ignored.

virtual tftp {
    active = 1
    address = <VIRTUAL IP> eth0:1
    vip_nmask = 255.255.255.0
    fwmark = 1
    port = 69
    persistent = 45
    expect = "OK - answer from server"
    use_regex = 0
    send_program = "/usr/local/bin/check_tftp --connect %h"
    load_monitor = none
    scheduler = wlc
    protocol = udp
    timeout = 6
    reentry = 15
    quiesce_server = 1
    server ftp1 {
        address = 10.0.0.2
        active = 1
        weight = 1
    }
    server ftp2 {
        address = 10.0.0.3
        active = 1
        weight = 1
    }
}

Monitoring

You may notice I call a check_tftp script, I took the Nagios Plugin.

A Hangup with Red Hat Piranha -- Fixed in 0.8.6.2

Another thing that is Red Hat Piranha Specific, but somebody here might run into it. This check_tftp script used with nanny, does not detect that a tftp server is not available, when the tftp client tries to time out it fails because the nanny process is blocking SIGALRM the signal that it uses on timeout. I had to compile nanny without it blocking SIGALRM as I could not find a way in a shell script to unblock that signal. Also nanny does not like when check_tftp returns with error codes. so I had to modify the script to always exit 0 instead of the different statuses it returns for Nagios.

Conclusion

I conclude that I wish I had this wiki when I started figuring this one out. I pieced this together with info from LVS-HOWTO services multi-port document.

LVS.png "Building Scalable TFTP Cluster using LVS" is an LVS Example related stub. You can help LVSKB by expanding it